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PREDICTIVE VALUE 
OF STANFORD-BINET PRESCHOOL ITEMS 


KATHERINE P. BRADWAY* 


San Francisco, Cal. 


The studies which have been made of the relation between 
intelligence quotients at the preschool level and those earned 
several years later indicate that existing preschool intelligence 
scales can not be relied upon to predict later IQ’s for individual 
subjects with the degree of accuracy which we generally demand 
for individual prediction. These findings challenge us to try to 
discover means of increasing the predictive value of preschool 
tests through more careful item selection. The approach which 
appears to hold the most promise is that of making an analysis of 
scales in terms of the correlation between success on each item, 
or each type of item, and terminal IQ. 

Nelson and Richards*’ have analyzed the predictive value of 
the individual items on the Gesell schedules for examining 
infants. They found marked differences between those items 
which correlated highest with total test score at the time of the 
initial examination and those which best predicted intelligence 
level reached one to two years later. Goodenough and Maurer® 
have made a comparative analysis of the Minnesota verbal and 
non-verbal preschool scales to determine which type of item 
best predicted later IQ. The results which they obtained showed 
that the Minnesota non-verbal items had higher predictive value 
than the verbal items did. Their findings with regard to the 
predictive value of individual items are being used as the basis for 
a revision of the Minnesota Preschool Scales (p. viii). 





* The author wishes to acknowledge her indebtedness to Professors Lewis 
M. Terman and Maud A. Merrill for making the data from the standardiza- 
tion of the Revised Stanford-Binet Scale available to her. 
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The purpose of the present study was to make a preliminary 
analysis of the preschool items on the Revised Stanford-Binet 
Scale in terms of their value in predicting later intelligence. The 
data which were used were obtained during the course of a pre- 
viously reported investigation which was concerned with the 
constancy of preschool IQ’s after a ten-year interval.' The 
subjects were initially examined on both forms of the Revised 
Stanford-Binet Scale when they were two to five-and-a-half years 
old; they were re-examined on Form L ten years later. With 
these data available we were able to use two approaches to the 
problem of the comparative value of the items. In the first place 
we prepared scales of different types of Stanford-Binet items and 
correlated scores on these scales with terminal IQ. Secondly, we 
computed biserial correlations between success and failure on 
each item and later IQ. 


SUBJECTS 


The subjects for this investigation were drawn from those 
children who were two to five-and-a-half years old when they were 
examined in California in 1930 to 1932 in connection with the 
standardization for the Revised Stanford-Binet Scale. Although 
a number of the children had moved in the interim, one hun- 
dred fifty-two of the original group of two hundred thirteen, or 
seventy-one per cent, were located in 1940. Thirteen of these 
subjects were eliminated because they had failed more than one 
item at the two-year, or lowest, age level on both forms of the 
initial Stanford-Binet examination. A fourteenth subject was 
eliminated because his birth-date could not be verified. The 
elimination of fourteen subjects reduced the total to one hundred 
thirty-eight. These one hundred thirty-eight subjects formed, 
according to the age at which they had been initially examined, 
two groups: the two- and three-year group consisted of fifty-two 
children initially examined at two, two-and-a-half, three, and 
three-and-a-half years of age; the four- and five-year group con- 
sisted of eighty-six children who had been initially examined at 
four, four-and-a-half, five, and five-and-a-half years of age. The 
means and standard deviations of the distributions of the initial 
IQ’s for the two groups were, respectively: 116 + 17 and 105 + 14. 
The selection with respect to IQ’s of the two- and three-year 
group was a result of the elimination of the younger children who 
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had failed more than one item at the lowest age level on both 
forms of the initial examination. 

Form L of the Revised Stanford-Binet Scale was administered 
to the two groups approximately ten years after the initial 
examination. To insure accuracy of scoring, every instance 
of doubtful scoring was submitted to Dr. Merrill for final decision 


CONSTRUCTION OF FOUR SPECIAL SCALES 


As a basis for comparing different types of items with respect 
to their value in predicting terminal IQ, four scales made up of 
items from the first nine levels of the Revised Stanford-Binet 
Scale were used: verbal, non-verbal, memory, and number- 
concept. These scales are presented in Tables 1 (a), (b), (c), 
and (d). Items not used in any of the scales are presented in 
(e) of Table 1.* For purpose of the development of the verbal 


TABLE 1.—SpeciaAL ScALES OF STANFORD-BINET ITEMS 
(a) Verbal Scale 


Biserial r’s 
Age  Predic- 
Item Location Value tion Validity 
II 
I ne cas Cao ee L, Il 1.9 
ll ee 1.8 
ss ocd ane caer L, Il 1.8 
Obj. by name............ M, II 1.7 
Pe Os. oS 3 0's bed een M, II 1.9 
, <. rrr 1.9 
II-6 
CRS, . ovo ciwepace me 2.0 
fC Sererererers 2.2 
0 re L, II-6 2.1 
Nn ous a ste uy aed L, II-6 2.4 
i i 2.3 
0 | Se ere M, II 2.1 
Os. by Ue8..........0... Mi R@.. 3.9 
PN Gi ceacicessuwest ae oe 


* Five items in Form M which are duplicates of items in Form L are 
omitted from Table 1. Six items at level VII which had age values above 
7.0 are also omitted. 
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Age 
Item Location Value 
II-6 
Pie Weeebis. n..55 0.3. TR 3.3 
aa M,II-6 2.3 
III 
a i 5s ate ata a L, III 2.7 
RS Srerrerrerere | eee 
Pic; voeeD:.<........1.... 3 2.7 
Obj. by use.............. M, III 2.7 
Ds ho dete eae M, III 2.8 
ITI-6 
ere oe 
ati sn se ea he L, II-6 3.1 
I ed sie s ba dee 84 M, III-6 3.4 
Ee ee M, IV 3.3 
6s os aru iche rage L, III-6 3.0 
TESS ere: le 
IV 
Ae el” ee 
ON ae rer re L, IV 3.5 
big aca saci M, IV 3.6 
SIP L, V 3.9 
Ee ee M, IV 3.9 
Nn ak sae ed M,1IV-6 3.7 
IV-6 
re a) 4.0 
Nn ecw a gibi) L, IV-6 4.2 
re 4.4 
ES ony eee L,IV-6 4.3 
adits ohne Gat M,IV-6 4.0 
cb iedss kos enens M, V 4.1 
V 
Dr OOD ss 3:4 bbs eh ass M, VI 4.9 
VI 
EE ee L, VI 5.6 
Ee ae M, VI 5.3 


TABLE 1.—(Continued) 


Biserial 
Predic- 
tion 


.65 

. 92** 

.29** 
1.00** 

. 62 


. 53 
. 54 
.40 
.42 
—" 
.66** 


.55 
. 68 
—_— 
. 26 
.94 
. 89 


. 53 
. 36 
47 
72 
.70 
.47 


. 55 
. 48 





rs 


Validity 


47 
. 84 
i | 
. 69 
.55 


81 
. 50 
. 60 
. 63 
. 83 
.82 


. 68 
. 56 
.49 
75 
. 62 
45 


.74 
. 66 
.69 
. 78 
.69 
. 76 


.61 


65 
. 68 

















Item 
VII 


Pie. ADGUP.. <6... 


Il 
Form board. , 


Block tower.... 
DEE nos a Seyane's 


II-6 
Form board 
Mot. Coord 


Str. beads...... 


Draw line 
Str. beads..... 
Block bridge 
Copy circle 
Form board 

III-6 


Draw CTOSS........cecee- 


Buttons 


te a cee oe 
es. sds aw one 
Str. beads...... 


Compl. man 


| eee 
Form discr..... 


IV-6 


BE 6 o's dius u»-0eln ae 


Animals....... 
Compl. bird 


Compl. man 


Se IG oiin ns 5-5 ERS 


seo 0 6 @ 


“eee 
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TABLE 1.—(Continued) 


Biserial 
Age  Predic- 
Location Value tion 
pobeke ae ae 6.9 .62** 
(b) Non-verbal Scale 
ae bai L, Il 1.8 
La pcan L, Il 1.9 
M, II 1.8 
oe Pen L, II-6 2.3 
tt eee M, II-6 3.1 
Fed con ue M, II-6 2.3 
bs te taae M, III 2.4 
Ao L, III 2.6 —.02 
a eee L, III 2.8 .21** 
od peda ae L, III 2.9 .65** 
a ae L, III 2.8  —.06** 
L, II-6 3.3 .16 
oa pee M,III-6 3.1 .18 
. M,III-6 3.3 .68** 
M, III-6 3.4 .46 
s deate Sai M, IV 3.4 15 
i aeae mee L, IV 3.6 .14 
. M, IV 3.8 . 69 
a L, IV 3.8 .50** 
L, V 4.4 24 
.M,IV-6 4.0 .49 
oan M,IV-6 4.0 .45** 
i. epee L, V 4.7 .19°* 
L, V 4.7 .44 
ita eee L, V 4.8 .47 
ee ee M, V 4.9 .08 





r’s 


Validity 


.49 


41 
53 
.35 
.45 


. 43 
. 66 
. 66 
74 
.33 


47 
64 
72 


43 
.60 
.67 


.48 
. 46 
.49 
.49 
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Item 
VI 


*Bead chain..... 
Bead chain...... 


VII 


Copy diamond..... 


II 


i 


II-6 


IRE os Stay'e i 
ere 


Bis... 5. 00s. 
, 2 ae 


0 ae 


Sentences....... 


Sentences....... 


VI 


*Bead chain..... 


Vil 


ae 
3 digits rev...... 


Concept: 2...... 


ere «ee 
66.38 =» oes 
ave 0 ¢ 0 8 8 
“Eu eet A we 


Commissions.......... 
ast Gee 


“e* © @ © © @ *@ 
o 26 oe Oe ‘é 
es 62682. 4-84 6 4 
S808 2.4.0. % 3 


“_* ee © © © @ © 


TaBLE 1.—(Continued) 


eee 


an 


M, VII 


(d) Number-Concept Scale 


Age 
Location Value 


6.8 
(c) Memory Scale 


i 


5. 


6. 


6 


Oo 


nw 


5 


6 
6 


3.4 


Biserial 
Predic- 
tion 


51 
.59 


— .45** 
. 36 
42 


74 


.33 
. 64 


. 66 
.49 
.50 
.55 
ol 


.45 


.43 





r’s 


Validity 


.58 
. 40 


42 
.61 
71 


. 68 


. 64 
.52 


.58 
51 
. 80 
70 
.58 


.60 


. 69 
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TABLE 1.—(Continued) 


Biserial r’s 
Age  Predic- 
Item Location Value tion Validity 
IV-6 : 
eens Bi. oe. i WG M, V 4.3 . 80 .61 
V 
Geena ai Sales L, V 4.6 .49 . 57 
VI 
No. concepts............. L, VI 5.4 .50 .56 
No. concepts............. M, VI 5.2 . 63 . 57 
2 POSTE ee 5.4 .42 .49 
VII 
gh os CI M, VII 6.3 
Cee SRE: ts. o's M, VII 6.8 
(e) Items Not Used in Special Scales 
ee L, Il 1.7 
Obj. by name............ L, II-6 2.0 
Com~rehens’n........... L, WI-6 3.2 .48 .70 
Comprehens’n........... M, IlI-6 3.0 .72 . 76 
PS Sh cae ct Aw M,IIIl-6 3.4 Al . 54 
Comprehens’n............ L, IV 3.8 .81 .78 
PU Sos eo ee M,IV-6 3.8 .35 .67 
Comprehens’n........... M,IV-6 3.9 .75 .63 
rr L, IV-6 4.0 .25 .62 
Aesth. comp............. L, IV-6 4.0 .37 . 64 
Ss WIA occ du cael OSk M, V 4.7 .30 . 64 
Comprehens’n........... M, V 4.7 41 . 64 
PE NCE oie ee R88 M, V 4.9 .51 . 66 
A ee ete L, VI 5.4 .42 .46 
82560866 Fa L, VI 5.0 .35 ~ .60 
Re GE, onc ieicesviawe ae Oe 5.3 .58 .73 
NRL 6 ccc QR M, VI 5.5 . 54 .52 


* Items included in both non-verbal and memory scales. 
** Based on one r only. Other r’s in this column are means of two or 
more r’s. 


scale, verbal items were defined as those items which depend 
mainly upon verbal comprehension or verbal expression for suc- 
cess. Non-verbal items were those selected by Merrill as being 
non-verbal ‘‘or at least as being less verbal than the other tests 
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contained in Forms L and M’”’® (pp. 142-144). The memory 
items were those listed by McNemar as measuring ‘‘ ‘memory,’ or 
more precisely ‘immediate memory’’’® (pp. 146-148). The 
number-concept items consisted of those in which concept of 
numbers is tested directly or where the subject is asked to count. 

In making up scales of items which were to be used for com- 
parative purposes it was essential that the several scales be 
equated for difficulty. The first step in the procedure to achieve 
this was the computation of the mean age values of the items 
at year levels II through VII for Forms L and M. These age 
values, presented in column 3 of Table 1, were calculated from 
the table of per cents passing items by age prepared by McNemar 
from the standardization data’ (pp. 89-98). Use of the summa- 
tion method of computing averages* made it possible to compute 
directly from McNemar’s table the average age of passing for 
each item. On the basis of these values, items were selected for 
each half-year level from II through V and for the whole year 
levels VI and VII. In the non-verbal scale, for example, the 
non-verbal items having age values between 1.50 and 1.99 were 
placed at the lowest half-year level, corresponding to the II-year 
level; items having age values of 2.00 to 2.49 were placed at the 
II14-year level, and so on. The same procedure was followed 
in the construction of the memory and number-concept scales. 
Because there were considerably more items for the verbal scale 
than for the other scales, it was possible to apply additional 
criteria in the selection of items for the verbal scale. The verbal 
items chosen were those which would result in mean age values 
for the successive levels which would approximate the mean age 
values at the corresponding levels of the non-verbal scale. The 
mean age values at corresponding age levels of the resulting 
verbal and non-verbal scales were within one-tenth year of one 
another. 

Although no selection among the memory items was exercised 
in the formation of the memory scale, the means of the age 
values at each level corresponded closely to the mean age values of 
the verbal and non-verbal scales. 

There were only eight items which could be used for the num- 
ber-concept scale. The lowest age level for which there was a 
number-concept item was at 3.0 to 3.9 years. The number of 





* Descriptions of this method as applied to scale items may be found in 
Thomson® and Bradway’. 
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items was so small that it was impossible to divide the age levels 
into half years. The mean age values for this scale showed 
larger deviations from the verbal scale mean values than did 
those of either the non-verbal or memory scales. 

Age credits were assigned to the several items at each age level 
according to the number of items at that level. For example, 
each of the six items at the II-year level of the verbal scale was 
assigned an age credit of one month, each of the ten items at 
the II-6 year level was assigned an age credit of .6 months, each 
of the five item’ at the III-year level was assigned a credit of 1.2 
months, and so on. In using the scales a basal of eighteen 
months’ credit for the verbal, non-verbal, and memory scales, 
and a basal of thirty-six months’ credit for the number-concept 
scale were assumed; proper credits were added to the basal 
according to the items passed. 


APPLICATION OF THE SCALES TO THE RETEST GROUP 


The initial Form L and Form M Stanford-Binet test results of 
the retest group of one hundred thirty-eight subjects were used to 
obtain scores on the four scales described above. The means 
and standard deviations of the resulting ratios* are indicated in 
Table 2. For comparison the means and standard deviations of 
the initial composite IQ’s and the retest Form L IQ’s are also 
indicated. Because of the absence of items at the lower end, the 
number-concept scale could not be used for enough children at the 
two- and three-year level to justify the computation of means. 
At the four- and five-year level there were eight children who 
were below the level of the number-concept scale and, therefore, 
could not be included in the statistics for this scale. 

In Table 2 it will be observed that at the younger age level 
the mean non-verbal ratio was considerably below the. other 
ratios.f No explanation for the selectivity of the sample with 
respect to this factor was found. With this exception the means 
for each age group were within a few points of one another. The 
means at the younger age level were higher than those for the 





* These ratios are comparable to IQ’s te the extent that they represent the 
ratio between an age score and chronological age. 

t Application of the special scales to five age levels of the total standard- 
ization sample yielded non-verbal ratios between 98 and 105. These com- 
pared closely with ratios for the three other scales and with Form L and 
Form M IQ’s for corresponding age levels. 
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older group due to the initial elimination, previously mentioned, 
of the retarded children at the younger age level. 

The next step was the computation of the correlations between 
ratios obtained on the special scales at the preschool level and 
the IQ’s obtained when these subjects were retested on Form L 
ten years later. These correlations are indicated in Table 3 
together with the correlations for the IQ’s based on Forms L and 
M and a composite of the two.* Correlations between the ratios 
and the initial composite IQ’s are also shown. Inspection of the 
data for the two- and three-year group shows that the correla- 
tion of .62 for the verbal scale approximated the correlations for 
the full scales. In fact, it was slightly higher than the correla- 
tion for Form L in spite of the fact that there are only forty-three 
items in the verbal scale as compared with fifty-four in Form L 
for the corresponding age levels. It will also be noted that the 
correlation for the memory scale was higher than that for the 
non-verbal scale even though there are only sixteen items in 
the memory scale as against twenty-nine in the non-verbal scale. 


TABLE 2.—MEANS AND STANDARD DEVIATIONS OF SPECIAL SCALE 
Ratios, InrtrAL Composite IQ’s aNpD Retest IQ’s or 
Two AGE Groups 





Age 2 and Age 4 and 























3 Years 5 Years 

N = 52 N = 86 

Mean| o |Mean| oa 
se haere ES 117 | 19 | 106 18 
Non-Verbal Ratio................/ 108} 16 | 103 14 
CP ere 118 | 26 | 104 19 
Number-Concept Ratio...........] ... - 108* | 14 
Initial Composite IQ.............!| 116] 17 | 105 14 
a SL oak acWae ae oes 115; 18 | 108°| 17 

*N was 78. : 


Comparison of the correlations for the four- and five-year 
group show that the memory scale predicted later IQ better than 





* These results were obtained in an earlier study’. 








7 ion 
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either the verbal or non-verbal scales did. Moreover, the corre- 
lation for the memory scale was within two points of that for the 
full scale of Form M despite the fact that the memory scale con- 
tains less than one-third the number of items which are included 
in Form M for the corresponding age levels. The correlation for 
the verbal scale was five points higher than that for the non-ver- 
bal scale. The correlation for the number-concept scale with 
only eight items was identical with that for the non-verbal scale 
which was made up of twenty-nine items. 


TABLE 3.—CORRELATIONS FOR SPECIAL SCALE RaTIos AND 
In1TIAL Form L, Form M, anv Composite IQ’s WITH 
Retest 1Q’s ror Two AGE GROUPS 











Age 2 and 3 Years | Age 4 and 5 Years 
N = 52 N = 86 
rwith | with | "WA" | + with 
Retest Retest 
Comp. 1Q Comp. 1Q 
IQ* IQ* 
Verbal Ratio........<... .93 .62 .79 .50 
Non-Verbal Ratio........ .78 .45 .76 45 
Memory Ratio.......... .84 .52 .78 .61 
No.-Concept Ratio....... = vat 677 45F 
Initial Form L IQ........ i .58 ieate .67 
Initial Form M IQ...... gga .67 piv .63 
Initial Comp. IQ........ saith .66 iA'¥2 .67 

















* Spuriously high because items in special scales also included in composite 
scale. 
t N was 78. 


It was found that the larger number of items in the verbal 
scale as compared with that in the non-verbal scale was not 
responsible for the higher predictive value of the former scale. 
When the number of items at each level of the verbal scale was 
reduced to the number of items at the corresponding level of the 
non-verbal scale, correlations for the verbal scale were obtained 
which showed a drop of only two points for the younger group 
and no change for the older group. 
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The results presented here may be contrasted with those found 
by Goodenough and Maurer in their study of the predictive value 
of the Minnesota verbal and non-verbal scales. The correlations 
between Minnesota verbal and non-verbal [Q-equivalents at 
thirty-six to forty-seven months and 1937 Stanford-Binet IQ’s 
obtained nine years later were .43 and .65, respectively,* (p. 80). 
These values are practically reversed for the three- and four-year 
group in the present study in which the correlations for the verbal 
and non-verbal scales are .62 and .45, respectively. Goodenough 
and Maurer’s correlations between Minnesota verbal and non- 
verbal IQ-equivalents at forty-eight to sixty months and 1937 
Stanford-Binet IQ’s obtained approximately ten years.later were 
.50 and .51, respectively’ (p. 80). The value for the Minnesota 
verbal scale is identical with that found for the Stanford-Binet 
verbal scale for our four- and five-year group, but the value for 
the Minnesota non-verbal scale exceeds that for the Stanford- 
Binet non-verbal scale. Goodenough and Maurer’s results for 
reexaminations after shorter intervals than nine and ten years 
showed a more marked superiority of the predictive value of the 
non-verbal scale than do those of their results reported here for 
the length interval most nearly corresponding to that obtaining 
in the present study. 


BISERIAL CORRELATIONS OF PRESCHOOL ITEMS 


The data at hand permitted the calculation of biserial correla- 
tions between success on items at the preschool level and terminal 
IQ. Biserial r’s were calculated not only for the age level at 
which the item appears in the Stanford-Binet Scale, but for each 
level for which the per cent passing the item was between twenty- 
five and seventy-five. No 7r’s were calculated for the two-year- 
age group because there were only seven subjects in this group. 
The number of subjects in the other age groups ranged from 
thirteen at three years to twenty-six at five-and-a-half years. 
After the biserial r’s were computed, the means of the r’s for each 
item were calculated.* These mean 7’s are presented in the next 
to the last column in Table 1. The 7’s which are based on only 
one year group and, therefore, are not means are identified by a 





* The r’s were first converted to Z’s (4 Table 14); the mean Z’s were then 
converted to r’s. 
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double asterisk. The number of subjects on which each of the 
mean correlations was based ranged from thirty-one to eighty- 
six. The validity r’s, or biserial r’s between passes and failures 
on these items and composite IQ at the time of the initial examina- 
tion for the standardization sample,® are presented adjacent to 
the prediction r’s for comparison. 

Inspection of the prediction r’s shows that those for the non- 
verbal items tend to be lower than those for the other items. 
There are only three non-verbal items with mean 1r’s above .50;* 
namely, ‘‘ Discrimination of animals’ and the two ‘Copying 
bead chain” tests. Four additional items have mean r’s in the 
40’s, and the remaining seven mean r’s are below .30. The 
majority of the r’s for the verbal items, on the other hand, are 
above .50. Four are .70 or above; namely, ‘“‘ Definitions,” two 
“Opposite analogies” items, and “Materials.” Eight items 
have mean r’s between .50 and .70, five between .40 and .50, and 
only three have mean r’s below .40. Thus of the fourteen items 
of the non-verbal scale having mean r’s, only twenty-one per 
cent have r’s above .50 and fifty per cent have r’s below .40, 
whereas of the twenty verbal items which have mean r’s, sixty 
per cent have r’s of .50 or above, and only fifteen per cent have 
r’s below .40. 

A consideration of the mean r’s for the memory items (Table 
1(c)) shows that these items have a relatively high predictive 
value. Of the eleven items which have mean r’s, seven, or 
sixty-four per cent, have r’s of .50 or above, and only two, or 
eighteen per cent, have r’s below .40; namely, “‘ Picture memories” 
and ‘“‘ Naming objects from memory.”’ The 7r’s for the three sen- 
tence memory items are between .55 and .74. The r’s for the 
four “‘ Digits forward”’ items are between .45 and .72. 

The six mean r’s for the number-concept items are all-above 
40, ranging from .42 to .80. The three below .50 are for ‘‘ Num- 
ber concept of two,” “‘Counting four objects,” and ‘‘Counting 
thirteen pennies.”” The number-concept items rank with the 
memory items in their predictive value in contrast with the 
lower predictive value of the non-verbal items. The median 
biserial r’s for the four scales in descending order are: .68 for 





* The r’s based on only one age level are not included in the following 
discussion because their reliability is too low to justify inclusion. 
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verbal items, .62 for memory items, .57 for number-concept 
items, and .49 for non-verbal items. 

In Table 1(e) are presented the r’s for items not used in any of 
the special scales. Of these fifteen items the three which have 
the highest mean r’s are all comprehension items with r’s between 
.72 and .81. Meanz7’s for five of the items are below .40; namely, 
‘Patience: pictures,” ‘‘Pictorial likenesses and differences,” 
‘ Aesthetic comparison,”’ “ Pictorial similarities and differences,” 
and ‘Maze tracing.”’ None of these items demands a verbal 
response on the part of the child, and it is probable that the 
understanding of the verbal directions is less difficult than the 
carrying out of the performance so that they can be said to be 
more ‘non-verbal’ than ‘verbal’ in type. 

A comparison of the validity biserial r’s, that is, the r’s between 
success on an item and total score at the time of the standardiza- 
tion, and the r’s discussed above which show predictive value, 
was made by computing the Pearson product moment correlation 
between the Z’s for the two sets of sixty-six r’s. The r’s based 
on only one age level and indicated with a double asterisk in 
Table 1 were not included in this correlation. The resulting 
correlation was .30.* The mean validity r was .63, whereas the 
mean prediction r was .52. It does not necessarily follow that if 
an item has a high validity as measured by its correlation with 
the test as a whole, it also has a high predictive value, or vice 
versa. The most striking examples of discrepancy between the 
two r’s are found in “‘ Response to pictures I’’ (M, VI, 4) which 
has a validity r of .61 and a prediction r of —.05; “‘Sorting but- 
tons”’ which has a validity r of .61 and a prediction r of only .18; 
and ‘‘ Number concept of three’’ which has a validity r of .61 and 
a prediction r of .80. 

The number of cases in the present study is not adequate, of 
course, to establish true values of the prediction r’s, but the 
differences between the validity and prediction r’s are sufficient 
to indicate the need of going beyond item correlation with total 
test score in selecting items for scales which are to be used for 
long-time predictions. Additional studies of the same nature 
will be required to check the stability of the values secured in 
this study before they can be validly used as a basis for item 
selection. 





* This correlation is attenuated due to sampling errors. 
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SUMMARY 


1) The purpose of the study was to make a preliminary analy- 
sis of the predictive value of the preschool items of the Revised 
Stanford-Binet Scale. 

2) Verbal, non-verbal, memory, and number-concept scales 
were prepared from the items appearing at year levels II through 
VII on the two forms of the Stanford-Binet Scale. 

3) The Stanford-Binet scales administered to one hundred 
thirty-eight children when they were between the ages of two 
and five-and-one-half years were rescored according to the 
four special scales. The obtained age scores were converted to 
ratios by dividing by the corresponding chronological ages. 

4) The ratios for each of the four scales were correlated with 
Stanford-Binet 1Q’s obtained ten years later for the group of one 
hundred thirty-eight children. 

5) The results indicated that the verbal and memory scales 
predicted later intelligence better than the non-verbal scale did. 

6) Biserial r’s were computed between success on each item 
at the preschool level and terminal IQ. 

7) Comparison of individual biserial r’s indicated that the 
verbal and memory items generally had higher predictive value 
than the non-verbal items. 

8) The relation between the validity of an item as measured 
by its correlation with the test as a whole and the predictive value 
of an item as measured by its correlation with terminal IQ was 
found to be positive but low. 

9) Discrepancies between validity and predictive value of 
individual items indicated the need for using the predictive value 
of items as a basis for selection for intelligence scale rather than 
relying solely upon correlation with total test score as an index of 
item value. 
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CHANGES IN GOODENOUGH IQ 
AT THE PUBLIC SCHOOL KINDERGARTEN LEVEL 


GELOLO McHUGH 
Barnard College 


When data were gathered for a monograph, “Changes in IQ at 
the Public School Kindergarten Level,’’* dealing with changes in 
scores earned by public school kindergarten children on the 1937 
Revision of the Stanford Binet Test*® over a period from two 
weeks prior to entrance to kindergarten to three months after 
entrance, the Goodenough Drawing a Man Test' was also 
administered to each subject. This test was included because 
it is a recognized non-verbal test of intelligence and because the 
author entertained an hypothesis that changes in IQ on the Stan- 
ford Binet test after kindergarten experience might be due in 
large part to the subjects’ lack of facility with speech, or to 
inhibition of usual speech behavior as a result of strangeness et 
cetera in the initial test situation. It was felt that the Good- 
enough test might support this hypothesis if found to be relatively 
uninfluenced by other factors which operate to depress initial 
Binet intelligence test scores below level of ability at this age 
level. The monograph mentioned above‘ (page 20) has sup- 
ported the hypothesis that changes in Stanford Binet IQ at the 
kindergarten level are due in part to improvement in the use of 
oral speech in the testing situation but, when the data for the 
entire study had been completed, no support for this hypothesis 
was found in the Goodenough scores. In fact, the Goodenough 
scores varied as much in the direction of improvement in score 
from test to retest after a short period of kindergarten experience 
as had the Binet scores. In view of this finding, it was decided 
to deal with the Goodenough scores in a separate publication 
where the significance of their changes could receive a more 
thorough treatment and interpretation. 


THE SUBJECTS 


The subjects of this study have been described in detail in 
“Changes in IQ at the Public School Kindergarten Level.’’ 
Briefly, complete Goodenough scores were secured on eighty-three 


of the ninety-one kindergarten children who made up the popula- 
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tion of the whole study. All of the subjects lived in the same 
town and attended the same school system. Goodenough test 
scores were obtained on ninety of the ninety-one subjects at the 
time of the second Binet test. The group is about equally 


TABLE 1.—INITIAL AND FINAL CHRONOLOGICAL AGES 


Test I Test II 
I SE ene whe oss 56-75 months 57-76 months 
Saal oA ss 61 months 63 months 
SR dats SU Ths Gaia Weekes 62.02 months 64 months 
a ee ee 4.38 months 3.97 months 


divided between the sexes and is homogeneous as to chronological 
age. Table 1 gives essential statistical data as to the CA’s of 
the subjects. A frequency distribution of Ca’s will be found in 
the above mentioned report (page 11).‘ 


ADMINISTRATION OF THE GOODENOUGH TESTS 


The Goodenough tests of this study were administered to each 
subject individually at the end of each Binet test period. All 
initial tests were administered during two weeks prior to entrance 
to kindergarten and final tests were administered either one, two 
or three months after the initial test had been given. Initial 
and final tests were administered by the same examiner in every 
case. When the Binet test had been completed, the examiner 
placed a fresh sheet of 814’’x 11” paper before the subject, 
with a short pencil he had already used during the Binet test, 
and said, ‘‘And now I want you to make a picture of a man. 
Take your time and be careful. Make the very best picture of a 
man that you can.”’ Each subject was allowed to work at his 
own speed and when he indicated that the drawing was finished 
to his satisfaction it was numbered and removed from sight. 
The examiner then placed a second sheet of paper before him and 
said, “‘I want you to make just one more good picture of a man 
and then we will be through working. Take your time and work 
carefully.” In the final assembly of scores on these drawings the 
subject was credited with whichever was better of the two trials 
he had been allowed per testing. Comparisons between trials 
will be reported under Results. 
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SCORING THE GOODENOUGH TESTS 


At the time of the administration of the Drawing A Man Test 
each result was coded and filed in order that in the scoring, which 
did not begin until all data were complete, the examiner would 
not know where an individual drawing belonged in the order of 
administration or to which subject it belonged. All tests were 
scored by one examiner who recorded scores on prepared record 
sheets so that it was unnecessary to mark the drawings in any 
way. This procedure made it possible to check the examiner’s 
scoring by having another person, familiar with the test, rescore 
some of them. Two such checks on the examiner’s scores were 
made. In one of these, forty drawings were withdrawn at ran- 
dom from the group and submitted to Dr. Adella Youtz, who had 
had a considerable amount of experience with the test. A posi- 
tive r of .94 was obtained between Dr. Youtz’s scores and the 
first scoring of these forty tests. For a second check, all of the 
drawings administered at the time of the second test were given 
to a Barnard College senior Psychology major for scoring as part 
of aterm project. A positive r of .92 was obtained between these 
second scores and the scores of the examiner. In view of these 
results, the initial scores of all drawings have been used in this 
report. 


OTHER TEST DATA USED IN THIS REPORT 


All other test data used in the present report have been taken 
from ‘‘Changes in IQ at the Public School Kindergarten Level’’* 
(pp. 26-29) where full details as to method of collection and scor- 
ing are given. All socio-economic data and information with 
regard to the education of parents were gathered by one examiner 
during home visits. A small number of cases were lost because 
of failures of parents to keep appointments. Where correlations 
between the Goodenough scores and these data are reported 
(Table 7) the size of the N will indicate that a few cases are miss- 
ing and that the corresponding Goodenough scores had to be 
omitted. A survey of the small number of missing cases in 
relation to Binet and other scores indicates that the results 
obtained with cases available would have been increased in their 
same direction if the missing cases could have been added. This 
fact will be partially supported in the discussion of results. 
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RELATIONSHIP BETWEEN GOODENOUGH AND STANFORD-BINET 
SCORES 


Table 2 shows the relationships obtained between the Good- 
enough and Binet scores for the eighty-three subjects on whom 
complete records were secured at the time of the first test and on 
ninety subjects who furnished complete data at the time of the 


second test. 


TABLE 2.—RELATIONSHIP BETWEEN GOODENOUGH TEsT SCORES, 
STANFORD-BINET Test Scores INITIAL AND FINAL TEstTs, 
AND BETWEEN CHANGES IN Boru Tests SCORES 
FROM TEsT TO RETEST 


Ist 2nd Ist 2nd Change Change 
Binet Binet Binet Binet in Binet in Binet 
MA MA IQ IQ MA IQ 

Ist 

Goodenough 

MA .41 + .06 

2nd 

Goodenough 

MA .45 + .06 

Ist 

Goodenough 

IQ .387 + .05 

2nd 

Goodenough 

IQ .40 + .05 

Change in 

Goodenough 

MA .16 + .06 

Change in 

Goodenough 

IQ 17 + .06 

N = 83 83 90 90 83 83 


A significant relationship between Goodenough and Binet 
scores is shown in Table 2, but the r’s obtained between Good- 
enough MA’s and the 1937 Revision of the Stanford Binet MA’s 
for subjects with mean CA’s of approximately five years do not 
compare favorably with Goodenough’s r of .70 (', Table 9, 
page 50) between 1916 Stanford-Binet MA’s and Goodenough 
MA’s of ninety-four children five years of age. The 1937 Binet 
scores were obtained by testing half of the subjects on Form L 
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and half on Form M. This may partially account for the differ- 
ence in r’s here. 

It is interesting to note that there is very little relationship 
between changes in Goodenough scores and changes in Binet 
scores after kindergarten experience. The interpretation of 
these low relationships between changes in scores after kinder- 
garten experience is difficult. It is proposed here that they may 
indicate that changes in scores of the two tests depend very little 
upon similar influences in the kindergarten environment which 
have operated to relieve whatever factors served to depress the 
initial scores; and that initial scores of kindergarten children on 
these two tests may be depressed below the real ability of the 
subjects by dissimilar experiences or lack of experiences during 
the pre-kindergarten years. This proposal appears to be sup- 
ported somewhat by Table 7, which shows the relationship 
obtained between rough measures of the environment from which 
the subjects came to the initial testing situation and Goodenough 
scores for both first and second tests. 


CHANGES IN GOODENOUGH MA AND IQ 


The Goodenough test was administered to each subject twice 
in succession at the times of the initial and final tests. A positive 
r of .91 was obtained between the eighty-three pairs of scores for 
test one. For trial one of test one, the mean raw score was 7.12, 
SD 4.08 points. The mean of trial two, test one, was 7.12, SD 
3.73 points. A positive r of .86 was obtained between the 
ninety pairs of scores for test two. For trial one, test two, the 
mean raw score was 9.39, SD 3.43 points. The mean of trial 
two, test two, was 9.17, SD 3.44 points. 

Complete Goodenough scores were obtained on eighty-three 
of the ninety-one cases for whom Binet data were secured. At 
the time of the initial tests seven children refused to try on the 
Goodenough test. One subject’s results had to be omitted 
because of the examiner’s failure to administer the Goodenough 
test at the time of the second Binet. A survey of Binet test 
results for the seven children who refused to try on the initial 
Goodenough test shows that, while one child registered no change 
in Binet scores from test to retest, the remaining six show a mean 
gain of 12.7 months in Binet MA and a mean gain of 17 points in 
Binet IQ. With significant relationships existing between initial 
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Binet and Goodenough scores as shown in Table 2, there seems 
to be in this fact of large gains in Binet score from test to retest, 
for subjects who refused the initial Goodenough, some support to 
the statement above that missing cases would have strengthened 
the finding obtained with available cases. It may also be argued 
in support of the data presented in the Binet report‘ that refusal 
of the Goodenough test was indicative of some initial test situ- 
ation factor or factors that were operating to depress the initial 
Binet test scores of these subjects. These factors, whatever 
they were, appear to have been considerably relieved by attend- 
ance for a short time in kindergarten. At the times of the final 
tests, which averaged approximately two months after the initial 
test, no subject refused to try on the Goodenough test. 


TABLE 3.—INITIAL AND FiInaL GOODENOUGH MENTAL AGE 


ScoRES 
Freq. Freq. Mo. MA Change 
Months Ist Test 2nd Test Test to Retest Frequency 
90-99 l 2 30 39 3 
80-89 6 8 20 29 18 
70-79 5 18 10 19 17 
60-69 26 31 0 9 39 
50-59 29 19 —1 —9 11 
40-49 11 5 —10 —19 4 
30-39 5 0 — 20 — 29 1 
N 83 83 83 
Range 30 93 Mo. 45 93 Mo. —21 to +36 Mo. 
Median 57 “ 66 “ +9 * 
Mean 58.81 65.42 “ + 6.6 Mo. 
SD 12.07 “ 10.98 . mia * 


Tr MA,» -++- .51 + .05 


CR MA §.25* 
* CR taking r into account Garrett (3, p. 217). 


Tables 3 and 4 present the Goodenough test results for initial 
and final tests of eighty-three subjects together with changes in 
test scores from test to retest. Frequency distributions for 
Stanford-Binet initial and final scores of ninety-one subjects, 
including the eighty-three above, with changes in Binet test 
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scores from test to retest, are given in ‘‘Changes in IQ at the 
Public School Kindergarten Level’’* (pp. 11 and 13). 


TABLE 4.—INITIAL AND FinaL GoopENouGH IQ Scores 


Freq. Freq. Pts. 1Q Change 
Scores Ist Test 2nd Test Test to Retest Frequency 
145 154 l 0 41 50 4 
135 144 4 3 31 40 5 
125 134 3 3 21 30 13 
115 14% .3 15 1] 20 10 
105 114 8 15 0 10 23 
95 104 23 17 —1] —10 16 
85 94 20 17 —I1 — 20 7 
75 =84 i) 10 —21 —30 l 
65 74 10 3 —31 — 40 3 
55 64 2 0 —41 — 50 1 
N 83 83 83 
Range 59 150 66 143 —50 to +50 
Mdn. 95 100 + 5 
Mean 94.7 102.1 + 7.4 
SD 19.38 16.73 18.60 
T IQ:_2 + .46 + .06 
CR IQ:-2 3.44* 


* CR taking r into account Garrett (*, p. 217). 


From Tables 3 and 4 it is seen that eighty-three children tested 
prior to entrance to public school kindergarten and retested after 
a mean attendance at kindergarten of 30.30, SD 12.34, half-day 
sessions (‘, Table 8, p. 16) have made quite significant gains in 
‘“‘mental growth” (', p. 90) as measured by the Goodenough test. 
It would be unreasonable to ascribe a mean increase in Good- 
enough mental age of 6.6 months and a mean increase in IQ of 
7.4 points to accelerated mental growth as a result of the stimu- 
lating effect of about thirty half-day sessions of public school 
kindergarten environment where no special effort was being made 
to promote mental maturity and the major portion of the time 
was spent in specific training to self help with wraps, toileting, 
etc.; supervised play out of doors, opportunity to learn how to 
use pencils, paper and crayons properly; hearing stories, singing 
songs and learning to march; with as many as thirty children in 
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one classroom under the supervision of one teacher who, some- 
times, had the assistance of a state normal school trainee. If 
the Goodenough test measures mental maturity, the short time 
within which a mean of 6.6 months of mental age is added by these 
eighty-three subjects precludes, in the light of present knowledge 
of mental growth rate in relation to increases in chronological age, 
an interpretation of this gain in terms of accelerated mental 
growth.* Some other explanations for these changes will be 
proposed. 

If the changes in Goodenough IQ are arranged in relation to 
size of initial Goodenough IQ and initial size of Binet IQ as is 
done in Tables 5 and 6, we find a very definite tendency for sub- 


TABLE 5.—DISTRIBUTION OF GOODENOUGH IQ CHANGES ACCORD- 
ING TO LEVEL oF INITIAL GOODENOUGH IQ Scores 
Level of Initial Number of Mean Goodenough IQ Points 


Goodenough IQ Subjects Change Ist to 2nd Test 
145-154 3 — 30.33 
135-144 2 — 3.00 
125-134 3 — 9.33 
115-124 3 — 7.33 
105-114 8 + .38 
95-104 23 + 3.61 
85— 94 21 +12.38 
75- 84 9 +11.00 
65- 74 9 + 28.67 
55- 64 2 + 22.50 

N 83 


jects who scored initially high on the Goodenough Test, to make 
lower scores on the second Goodenough test, and for initially low 
scorers to improve considerably on the second test. When the 
changes in Goodenough scores are arranged in relation to the 
initial Binet scores, which are considered to be a better criterion 
of the subjects’ intellectual status at entrance to kindergarten, 





* cf. Wellman’'*.*-!° who proposes that changes in IQ scores from test to 
retest after pre-school experience result from pre-school experiences of the 
child and growth in intelligence due to the stimulation of these experiences 
rather than to errors in testing due to factors which operate to depress initial 
IQ scores at this age level. 
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the tendency for high initial scorers to lose and low scorers to 
gain almost completely disappears, being replaced by a tendency 
for high initial scorers to gain less on the second test than low 
initial scorers. It is difficult to believe that so short a period of 
public school kindergarten attendance as is reported here could 
be proved to be so intellectually depressing for the entering kin- 
dergarten child as to reduce his ability to learn by as much as 
fifty points (see Table 7) on an index of this ability. In account- 
ing for changes in IQ at the pre-school level Wellman has stated: 
“Theoretically if a child’s IQ is as high as the particular environ- 
ment is capable of producing in him, there will be no increase; 
conversely, if the environment is a depressing one, the amount of 
drop will be greater when his initial status is higher.” ('°, p. 36.) 


TABLE 6.— DISTRIBUTION OF GOODENOUGH IQ CHANGES ACCORD- 
ING TO LEVEL oF INITIAL Binet IQ Scores 
Level of Initial Number of Mean Goodenough IQ Points 


Binet IQ Subjects Change Ist to 2nd Test 
125-134 6 + .33 
115-124 8 — 1.75 
105-114 15 + 8.53 

95-104 32 +11.75 

85-— 94 18 + 3.28 

75— 84 3 +11.66 

65— 74 l + 4.00 

N 83 


An analysis of the Goodenough data on which Table 5 is based 
reveals that fifteen subjects lost a minimum of 7 IQ points each on 
the Goodenough test from test to retest, while forty gained a 
minimum of seven points each under the same circumstances. 
If one will accept the best Binet IQ of each subject regardless of 
whether this was earned on the initial test prior to entrance to 
kindergarten or at the end of the first, second or third month 
after entrance, as the best indication of the intelligence with 
which the subject was equipped to meet and react to the kinder- 
garten environment, one has a criterion against which to evaluate 
the changes in Goodenough IQ. Table 7 shows the relationship 
between significant gains and losses in Goodenough IQ and the 
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best Binet IQ as a best available criterion of the subjects’ intellec- 
tual status at entrance to kindergarten. Gains and losses of 
seven points and over have been selected as significant changes in 
Goodenough IQ because of Goodenough’s statement that ‘‘the 
probable error of estimate of an IQ is approximately 5.4 points 
at all ages from five to ten years.” (', p. 81.) 


TaBLE 7.—Gains AND LossEs IN GooDENOUGH IQ as RELATED 
to Brest Binet IQ 





Losses of Seven Points | Gains of Seven Points 








or More or More 
Best Ist to 2nd Best Ist to 2nd 
; Test Losses ; Test Gains 
Binet ; Binet : 
1Q in Good- 1Q in Good- 
enough IQ enough IQ 
Range...... 90 to 126 | —7 to —50 | 85 to 133 | +7 to +50 
Mdn.......} 109.00 — 14.00 108.50 +23 
Mean...... 110.67 —19.07 108.03 +22.15 
Rk cshe’ wsdl 8.94 12.01 10.94 11.44 
ares 15 15 40. 40 

















If the best Binet IQ obtained in two testings by the same exam- 
iner is accepted as a criterion of intellectual status at entrance to 
kindergarten, Table 7 makes it obvious that losses in Goodenough 
IQ from test to retest did not result from the depressing effect of 
the kindergarten environment upon intelligences superior to its 
stimulus value. Subjects of equal and, in some instances, better 
intellectual status made equal and, in some instances, greater 
gains than the losses obtained under the same circumstances. 
It is suggested that the above results indicate that the search for 
the explanation of gains and losses in intelligence at the pre-school 
level may be sought in other areas than the stimulus value of the 
pre-school environment. It is proposed that a fruitful area of 
search may lie in the relationships between the environment from 
which the subject comes to the initial test and the initial test 
scores. If there is found to be a significant positive relationship 
between these two, the influences of the environment may then 
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be evaluated in terms of an inverse relationship between gains in 
intelligence scores as a result of its stimulation and the scores for 
the environment from which the subjects came. 

Table 8 shows the relationships obtained in this study between 
rough measures of the environments from which the subjects 
came to the initial testing situation and the Goodenough scores. 
The measures of environment are: The Barr Occupations Rating 
Scale’; The Whittier Home Rating Scale''; and the Education of 
the Parents, in terms of the number of years spent in school. 
The methods of securing scores on these measures of environ- 
ment and the treatment of scores have been presented in detail 
in the Binet report (‘, pp. 26-29). 


TABLE 8.—CORRELATION OF MEASURES OF PARENTS’ Socio- 
Economic Status with GoopENOoUuGH MA’s anv IQ’s aT 
THE TIME OF THE FirsT AND SECOND TESTS AND 
WITH CHANGES IN GoopENoUGH MA anp IQ 


FROM First TO SECOND TEST 
Change Change 

Ratings N MA Ist MA2nd inMA IQIst IQ2nd in IQ 
Barr occup. 74.40+.07 .21+.08 —.36+.07 .40+.07 .17+.08 —.44+.07 
Whittier 78 .13+.08 .11+.08 —.01+.08 .12+.08 .09+.08 —.04+.08 
Edu. father 77 .44+.07 .15+.08 —.30+.07 .45+.07 .15+.08 —.34+.07 
Edu. mother 77 .14+.08 .17+.08 .08+.08 .17+.08 .23+.08 .05+.08 
Mid-parent 

edu. 77 .34+.07 .18+.08 —.14+.08 .36+.07 .214+.08 —.17+.08 


A comparison of Table 8 with Table 21 of the Binet study 
(4, p. 27) shows that there is much more relationship, on the 
whole, between the Goodenough scores and these measures of the 
home environments than was obtained between Binet scores of 
the same subjects and these measures. In accounting for the 
clearer trend to relationship between initial Goodenough scores 
and measures of the environment, it seems reasonable to propose 
that the initial score on the Goodenough test prior to entering 
kindergarten may depend on a smaller number of possible home 
experiences than does the initial score of the Binet test. For 
example, initial scores on the Goodenough test may be markedly 
influenced by background of experience with drawing and draw- 
ing materials. In accounting for the more definite trend to an 
inverse relationship between these measures of home environ- 
ment and changes in Goodenough score from test to retest after 
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kindergarten experience, it seems reasonable to propose that the 
effect of a short period of kindergarten experience on Goodenough 
scores may be more marked for those who begin school poorly 
equipped with whatever is necessary for achieving a best Good- 
enough score in relation to true ability. It is suggested that the 
measures of home environment used here have indicated the 
possible existence of factors which predispose a subject to score 
below ability on an initial Goodenough intelligence test without 
revealing what these environmental factors are specifically. 

The environment measures used here are too crude to support 
claims that anything definite has been discovered regarding home 
environments which may predispose subjects to score lower than 
their abilities warrant on an initial intelligence test or tests prior 
to school experience. The data in Table 8 are not entirely con- 
sistent and some of the relationships are not significant. In the 
case of failure to obtain results in the expected direction with 
the mother’s educational status measured in terms of years of 
school attendance one might argue that, since girls remain in 
school longer than boys regardless of ability, this educational 
status of mothers is a much cruder index of the environment they 
provide for the children than the same data on the fathers. The 
data in Table 8 indicate a significant relationship between cer- 
tain aspects of environment from which these subjects came to 
their initial Goodenough intelligence test prior to kindergarten 
experience and the scores earned on the initial test. This indi- 
cated relationship does not solve the riddle of significant increases 
in Goodenough intelligence test scores after a period of pre-school 
experience, but it does point the search for the cause in a direction 
other than explanation in terms of accelerated mental growth 
as a result of kindergarten experience. 





SUMMARY AND CONCLUSIONS 


A group of eighty-three public school kindergarten children 
were initially tested on the Goodenough Drawing a Man Test 
during two weeks prior to entering school and retested on the 
same test at either one, two or three months after the initial 
test. The mean interval between tests was 1.93 months during 
which the subjects attended a mean of thirty half-day sessions of 
school. The mean CA of the subjects at the time of the first test 
was 62.02, SD 4.38 months. 
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Significant gains in Goodenough scores from test to retest have 
been registered by these subjects. The mean increase in Good- 
enough MA is 6.6, SD 11.44 months and the mean increase in 
Goodenough IQ is 7.4. SD 18.60 points. The critical ratio 
between first and second MA is 5.25 and the critical ratio between 
first and second IQ is 3.44. In both of these critical ratios the 
correlation between initial and final scores has been taken into 
account. (4, p. 217) 

Significant relationship has been demonstrated between initial 
Binet and Goodenough scores and between final Binet and Good- 
enough scores, but these relationships are not as favorable when 
the two forms of the 1937 Revision of the Stanford-Binet are used 
as those reported by Goodenough with the 1916 Stanford Binet. 

If the best of the two Binet I1Q’s secured by the same examiner 
during the period of examination is accepted as a criterion of the 
intelligence with which the subjects met and reacted to the 
kindergarten environment, there is found to be no relationship 
between gains and losses in Goodenough IQ and the degree of 
intelligence with which the subjects entered the kindergarten 
environment. 

Significant positive relationships between some measures of 
the environment from which the subjects came to kindergarten 
and initial intelligence test scores, before a short experience in 
kindergarten, indicate that a fruitful area of search for the 
explanation of low initial Goodenough intelligence test scores on 
preschool children may be found in the examination of their 
experiences with situations similar to the testing situation prior 
to school experience. 

Significant inverse relationships between gains in Goodenough 
intelligence test scores after kindergarten experience and some 
measures of the environment from which the subjects came to 
kindergarten indicate that a fruitful area of search for the explana- 
tion of gains may be found in the differences between the kinder- 
garten environment and the home environment, and the effect of 
these differences with regard to scoring or failing to score near 
true ability on a Goodenough intelligence test. 
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A CLASSROOM EXERCISE 
FOR DEMONSTRATING CERTAIN 
CHARACTERISTICS OF LEARNING 


C. H. LAWSHE, JR. 


Division of Education and Applied Psychology 
Purdue University 


Effective demonstration exercises that meet the dual criterion 
of interesting and stimulating the student on the one hand and of 
illustrating significant psychological principles on the other are 
all too uncommon. The present paper describes a learning 
exercise that meets both of these standards and, in addition, is 
relatively easy to administer even to lecture groups of a hundred 
or more. It yields reasonably consistent results from class to 
class and has been used with non-student industrial groups with 
equal success. It adequately illustrates six key points that merit 
emphasis in the discussion of learning in any beginning course in 


psychology. 
THE EXERCISE 


Materials—A jumbled numbers sheet like the one described 
by Aiken and Lilly! is used. Numbers ranging from 1 to 60 are 
arranged on an 8}4 x 11 sheet in what appears to be a completely 
random fashion with the number ‘1’ in a circle in the upper left 
corner. Actually, the numbers are positioned according to a 
simple system which, when known, greatly reduces the amount of 
random ‘looking’ for any particular number. Each member of 
the class is supplied with sixteen sheets which he is instructed to 
keep face downward. A simple statement of the ‘rule’ explaining 
the positioning system is mimeographed on the back of one of the 
number sheets. This sheet carrying the rule is inserted as num- 
ber four for a third of the students (called Group A), as number 
ten for another third (called Group B), while the remaining third 
of the class (called Group C) receives no statement of the rule. 

Procedure.—Instructions similar to the following are given: 
“In connection with our study of learning, I am going to place 
you in a learning situation. On the other side of these sheets are 





1C. C. Aiken and Scott B. Lilly, Teacher Training for Industry. New 
York: McGraw-Hill, 1942, pp. 19-25; 138. 
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numbers running consecutively from 1 to 60 in jumbled fashion 
like this (show a sheet quickly). All of the sheets are alike and 
the number ‘1’ is always in a circle in the upper left corner of the 
page. When I give the signal, turn your paper over, and with 
your pencil connect the ‘1’ with the ‘2,’ the ‘2’ with the ‘3,’ and 
so on, connecting as many numbers in sequence as you can in the 
time provided. You may find a note written to you on the back 
of one of your sheets. If you do, read its contents but do not let 
the others know. Do not talk to anyone. Ready; go.” Forty 
seconds are allowed on each trial with approximately the same 
time between trials. 

Following the sixteen trials, each member of the class plots on 
a prepared graph sheet the number of figures he located on each 
of the sixteen trials an indicates the group to which he belongs 
(A, B, or C). The graphs and the number sheets are then col- 
lected. At the next meeting of the class, the graphs are dis- 
tributed, the ‘rule’ is discussed, and slides reproduced from the 
figures in this paper are shown to illustrate the lecture which is 
built around the six key points presented below. 


INSTRUCTIONAL KEY POINTS 


The Fact of Individual Differences.—Do students differ markedly 
in their ability to perform such tasks as the location of numbers? 
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Figure 1. Distribution curve showing the percentage of approximately 
200 students scoring at various levels on the number location exercise. 


The data obtained in the demonstration exercise add meaning to 
any discussion of individual differences that may have preceded. 
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The fact that members of the class themselves have participated, 
of course, adds considerable interest to the presentation. Figure 
1 shows the score distribution of approximately two hundred 
students on the initial trial of the exercise. It is pointed out that 
the ‘average’ student located twelve numbers, that some students 
located as few as two, and that some located as many as twenty- 
one. A quick show of hands in response to questions as to how 
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Ficure 2. Learning curve drawn through the mean scores of each of sixteen 
trials on the number location exercise (Group C). 


many performed above or below a particular score is helpful. 
Pertinent comments are also made about the normal curve and 
its significance. . 

The Pattern of the Learning Curve.—What is a learning curve; 
what does it look like? Group C (which did not receive the rule) 
is used to demonstrate the pattern of the learning curve. Figure 
2 shows a free-hand curve fitted through the mean scores for 
each of the sixteen trials. The data are adequate as a basis for 
discussion, even though improvement is still taking place on the 
sixteenth trial. Best results are obtained when this slide is 
followed by additional ones showing learning curves from other 


sources. 
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The Effect of Practice on Individual Differences.—What hap- 
pens to the individual differences demonstrated in Figure 1 after 
all of the persons have had equivalent amounts of practice? Do 
the differences disappear; do they increase? Figure 3 is used to 
discuss this question. The students comprising Group C were 
divided into two subgroups on the basis of their performance on 
the initial trial. That is, the half that did best on the first trial 


40- 


Ol 
on 
j 


Ul 
oO 
i 


~ 
uo 
1 





NUMBERS LOCATED 
no 
oO 








15- 
10- 
5 + 
0 | tT ' i ' + 4 ' ' ' ' ' | 2 ' 
12345678 9 IDF 12 13 14 15 16 
TRIAL 


Ficure 3. The mean performance of two subgroups of Group C classified on 
the basis of performance on the first trial. 


constituted one group and the half that did poorest constituted 
the other. The mean scores for each of the two subgroups were 
computed for each of the sixteen trials and their respective curves 
are shown in Figure 3. The graphic presentation provides an 
illustration of the principle that, while everyone improves, those 
who excel at first tend to continue to excel and those who are 
inferior at first tend to continue to be inferior. There is also a 
tendency? for the two groups to get farther apart as the trials 
continue. 





2 None of the differences reported have been tested for statistical signifi- 
cance. The data included are not intended to constitute proof of what are 
already well established facts; they are only illustrative. 
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Figure 4 also adds meaning to this discussion. Again Group C 
was used and percentile curves for scores on the first trial and 
last trial are presented. While the best subject located approxi- 
mately nineteen more numbers than did the poorest student on 
the first trial, the best subject located about forty-three more than 
the poorest student on the final trial. This fact leads logically 
to the generalization that in complex tasks, such as locating 
numbers, individual differences generally increase with constant 
practice or training. 
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Figure 4. Percentile curves of the performance of Group C on the first and 
last trial. 


5 


Also of consequence in Figure 4 is the fact that there is an over- 
lap in the two percentile curves. In other words, about three 
per cent of the group did not do as well on the sixteenth trial as 
did the best performers on the initial trial. 

Insight and Learning.—One who possesses insight is aware of 
certain relationships and their pertinence to the problem at hand. 
Whether this awareness is self-acquired or whether it results from 
specific instruction is another question. Assuming that the 
members of Group A understood the ‘rule’ which was given to 
them between the third and fourth trial, they were able to pro- 
ceed with a certain insight or awareness of relationships that was 
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Ficure 5. Curves showing the mean performance of Group C which received 
no help and Group A which received the ‘rule’ before the fourth trial. 
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Ficurs 6. Curves showing the mean performance of Group C which received 
no help and of Group B which received the ‘rule’ before the tenth trial. 
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not possessed by the vast majority of Group C. (Usually less 
than five per cent of Group C discover the ‘rule’ themselves 
without assistance.) Figure 5 shows the mean performance of 
the two groups on the several trials. 1t will be noted that they 
performed about the same on the first three trials. Following 
the giving of the ‘rule’ to Group A between the third and fourth 
trial, the performance of Group A excels that of Group C at all 
times. 
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Ficure 7. Curves showing the mean performance of the upper and lowe 
half of group C when divided on the basis of mental alertness test scores. 


Interference or Negative Transfer.—Does the acquisition of cer- 
tain habits actually retard the learner when he attempts to build 
certain other habits? Figure 6 presents an excellent illustration 
of this point. Again shown is the mean performance of Group C. 
Also, there is plotted the mean performance of Group B which 
received the ‘rule’ between the ninth and tenth trials. It will be 
noted that the two groups appear approximately equal for the 
first nine trials. Following the introduction of the ‘rule’ to 
Group B, however, there is a sharp drop in performance which is 
followed by a partial recovery. Within the period of the exer- 
cise, Group B did not get back to the general level of the group 
that received no help. 
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The Réle of Intelligence.— What part does intelligence play in 
the learning activity? Members of Group C were again classified 
into two subgroups, this time on the basis of their perform- 
ance on a standard test of mental alertness.* The mean per- 
formance of the ‘most intelligent’ and the ‘least intelligent’ half 
have been plotted in Figure 7. Notable is the fact that on the 
early trials, small if any differences are obtained. Possibly 
intelligence is relatively unimportant in ‘locating’ numbers. As 
the trials progress, however, the process of ‘remembering’ loca- 
tions once they have been found apparently is related to intelli- 
gence. At any rate, the difference between the two groups tends 
to increase as the trials continue. 


SUMMARY 


A simple classroom learning exercise involving the location of 
jumbled numbers on a page is described. Sixteen trials of forty 
seconds’ duration provide data for discussing the following six 
key points: the fact of individual differences, the pattern of the 
learning curve, the effect of practice on individual differences, 
insight and learning, interference or negative transfer, and the 
role of intelligence in learning. 





§ Joseph Tiffin and C. H. Lawshe, Jr., ‘‘The Adaptability Test: A Fifteen- 
minute Mental Alertness Test for Use in Personnel Allocation,” J. appl. 
Psychol., 1943, 27, 152-163. 








TESTING FOR APTITUDES' 


ROBERT A. DAVIS 


University of Colorado 


Testing for aptitudes in college should be regarded as one 
more link in the chain of testing and evaluating which has been 
in progress throughout the student’s elementary- and secondary- 
school years. For the typical student just entering the college 
there is already available a considerable amount of data relating 
to his abilities and interests as revealed by his previous record. 
These data may include not only the student’s scholastic record, 
but his more informal activities, preferences and work expe- 
riences, together with opinions of school officials regarding his 
educational and vocational promise. These formal and informal 
records of the entering student provide evidence of potential 
abilities and interests and form a background for the interpre- 
tation of test results. 

The first obligation of the college is to evaluate, on the basis 
of test scores and other data, the student’s present abilities and 
interests, and to use this evaluation in predicting his performance 
in the future. Minimum essentials of a testing program for 
aptitudes to meet this requirement include two types of measure: 
(1) tests to determine the student’s potential learning ability, 
and (2) tests to determine his interests. 


1. TESTS WHICH MEASURE POTENTIAL LEARNING ABILITY 


The program for the measurement of aptitudes should begin 
with the administration of tests designed to measure the indi- 
vidual’s learning rate and ability to profit by training and instruc- 
tion. Because test-makers construct such tests to measure 
potential learning ability rather than actual accomplishment, 
care is taken, in preparation of test materials, to include items 
responses to which are indicative of the learner’s performance in 
the future. They thus attempt to reduce to a minimum the 
ability of a person being tested to utilize information stored in 
memory. These tests may be conveniently classified as: (a) 
tests designed to present an ‘overall’ picture of the learner’s 





1 Abstract of address before Junior-College Conference held at the Uni- 
versity of Colorado, August 14-19, 1944. 
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abilities, and (b) those more specific in nature and planned to 
predict efficiency in particular types of activity including subject- 
matter in college. Each type of test, therefore, is supposed to 
forecast the performance of the individual in learning situations. 

a) Tests designed to present an ‘overall’ picture of the learner’s 
abilities.—Testing programs properly should begin with the 
administration of a psychological examination constructed to 
yield a survey of the learner’s general abilities. This general 
examination, as the term implies, attempts to measure the 
learner’s reaction to varying types of material so that the total 
score resulting from composite treatment of its various sections 
indicates the student’s potential learning ability in a variety of 
learning situations. Throughout the history of psychological 
testing the aim has been to select testing materials which reveal 
the learner’s performance in different situations. Consequently, 
in order that the total score resulting from composite treatment 
of the several sections of a test may correlate significantly with 
outside criteria such as achievement, teachers’ estimates, or 
other tests of known or assumed validity, performance on the 
different sections should not correlate closely with each other. 
Low intercorrelations among different sections of a test imply 
that different mental functions operate; high intercorrelations 
imply that similar mental functions are involved. The different 
sections of some typical general psychological examinations 
correlate with each other from approximately 0.10 to 0.30, with 
a median of 0.15 or 0.20. In only a few instances have test- 
makers applied factor-analysis techniques as a means of elimi- 
nating overlapping abilities. Nevertheless, authors of tests 
have been sufficiently scientific in selecting and validating test 
materials to justify the claim that their tests are surveys of 
general abilities. 

Following the lead of Binet and Terman general psychological 
tests have tended to stress the measurement of higher mental 
processes, emphasizing the learner’s ability to react to verbal 
materials. They thus tend to be heavily weighted with language 
responses as opposed to motor reactions. The belief that tests 
of ‘general intelligence’ should be measures of ability to think 
by means of symbols has been generally accepted in constructing 
tests for use in secondary schools and colleges. And this empha- 
sis upon abstractions and symbols has made such tests particu- 
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larly useful in predicting general scholastic efficiency. The more 
closely test materials coincide with the materials and mental 
processes required by learning situations in school and college, 
the greater their predictive value. Psychological tests which 
stress the measurement of verbal ability correlate with achieve- 
ment in verbal subject-matter as highly as 0.50 or 0.60; those 
which minimize the verbal factor, on the contrary, may not 
correlate more than 0.30 or 0.40. These results apply, of course, 
to the total score resulting from composite treatment of the 
different sections of atest. In contrast, if predictions are desired 
in more specific learning situations, as, for example, in the case 
of foreign languages and mathematics, sections of the psycho- 
logical examination which emphasize language and numbers will 
be even more highly predictive of performance in those subjects. 
The psychological examination, therefore, if treated analytically 
is the first step in the exploration of special abilities and interests. 

b) Tests which measure special abilities.—The next step in the 
measurement of potential abilities should be the administration 
of aptitude and prognostic tests as a focal attack in the search 
for special abilities. The general psychological test is the 
general approach to the discovery of abilities; the aptitude and 
prognostic tests the specific approach. The former is extensive, 
the latter are intensive. 

Measurements of special abilities include ‘aptitude’ tests in 
fields such as music, art and mechanics; and ‘prognostic’ tests 
in subjects such as mathematics and foreign languages. Apti- 
tude tests have tended to be more particularly concerned with 
vocational fields; prognostic tests, with school subjects. In each 
case, however, the objective has been to arrange testing situ- 
ations which will be indicative of the learner’s efficiency in 
specific situations. Since aptitude and prognostic tests are 
designed for specific purposes, and testing materials so selected 
as to be representative of types of performance sought, it is no 
accident that they tend to have greater validity in specific 
situations than general psychological examinations which meas- 
ure superficially a wide range of abilities. Scores on some 
general psychological examinations, for example, correlate with 
achievement in geometry 0.50 or 0.60. A prognostic test in 
geometry, on the other hand, may correlate with accomplish- 
ment in geometry as highly as 0.70 or 0.80. And in the case of 
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certain jobs and trades, in which activities are reduced to specific 
skills, validity coefficients may be as high as 0.80 or 0.90. 

Aptitude and prognostic tests, despite their value for predic- 
tion, however, have not as yet been extensively developed for use 
in secondary schools and colleges. Test-makers appear to have 
concentrated their efforts on the construction of tests for certain 
specialized fields such as music, art, and mechanics; in the wider 
realm of vocational and educational activities only a beginning has 
been made. Tests designed to measure ability to master specific 
academic subjects have been conspicuously lacking in number 
and quality; prognostic testing in such fields has also centered 
largely in foreign languages and mathematics. Several obstacles 
have been encountered. We do not as yet know whether there 
is justification for constructing tests for each of the major fields 
of subject-matter. We do not know whether abilities for several 
subjects are bound together by a common factor, or whether 
they are sufficiently varied in the kind of response required to 
warrant tests which measure special abilities. In the mean- 
time educators are faced with the problem of providing for the 
increasingly wide differences between individuals, and at the 
same time are expected to recognize variations within the indi- 
vidual. Not until testing has advanced to a point where abilities 
are identified and isolated will it be possible for the schools to 
give due recognition to special abilities and interests. 


2. INTERESTS 


Although potential abilities rank high among the list of factors 
indicative of potential success, the individual’s active interests 
constitute the driving force in his achievement. Measures of 
potential learning ability are significant, therefore, only insofar 
as they indicate effective performance. 

The scientific study of interests has consisted sainsinalle in 
cross-sectional investigations showing the preferences and 
activities of various age and grade groups. Facts gathered by 
such means provide valuable information for curriculum-builders 
and counsellors. The information would be still more valuable, 
if data were available for the same individuals over a wide span 
of years. Despite these limitations a great deal is known con- 
cerning interests. Among other things it is known that interests 
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develop gradually from one age level to another just as any other 
phase of development such as physical or mental traits. The 
nature and extent of interests vary widely with intelligence 
levels. Group differences are also relatively unimportant in 
comparison to individual differences. Perhaps the most signifi- 
cant fact for counsellors is that the individual’s experiences in 
school and college play a dominant réle in determining his type 
and range of interests. 

All of the expressed interests of the student should be con- 
sidered. The college counsellor, however, will be especially 
concerned with academic and vocational interests. Standard- 
ized inventories and questionnaires will form a nucleus for inter- 
pretation of other information gained in a more informal manner. 
The counsellor should make full use of two types of data: (a) the 
student’s reactions to questionnaires, inventories, etc., so con- 
structed as to reveal the extent to which he ‘likes’, ‘dislikes’, or 
is ‘indifferent’ to a variety of activities, and (b) a record of his 
major activities during a given period of time. Through the 
use of information concerning both preferences and activities, 
insight will be gained as to the significance of interests as shown 
by their relation to other aspects of the student’s development. 
It should be possible to determine whether interests are stable or 
vacillating, superficial or genuine, in conflict or harmony with 
demonstrated abilities; and whether they are expanding in cer- 
tain fields while concentrating in others. Most significant of all, 
the counsellor will wish to know whether the college is having 
the effect of modifying present interests, of creating new ones, 
and of assisting the student in gaining deeper understanding of 
his own potentialities. 

The intimate relationship between learning and testing appears 
to have been better recognized by the psychologist than by 
the educator. The psychologist in his laboratory plans learn- 
ing experiments in such a way that every evidence of learning is 
a test of some kind, and every result of testing is evidence of 
learning. Perhaps this fact in itself accounts for the marked 
improvement in learning usually noted in laboratory situations. 
In the schools, however, learning tends to be considered as an 
activity apart from testing. Learning is considered to be of 
primary concern to the student; testing of primary concern to 
the teacher and administrator. As a result the student has the 
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idea that tests are administered principally for purposes of 
appraisal. 

Testing should be regarded as a means of revealing to the 
student his own potentialities and weaknesses, of holding out to 
him desirable goals of accomplishment, and of stimulating him to 
further improvement. Tests achieve their fullest purpose when 
they reveal growth, act as spurs to improvement, and serve as 
instruments which students will find useful in their own evalu- 
ation. If tests serve the student, rather than the school, the 
student will be the first to indorse their use. 

The popularity of psychological examinations, particularly 
the ‘general intelligence test,’ has been due to the assumption 
that such tests were measures of hereditary equipment and con- 
sequently relatively free from the influences of training and 
environment. The IQ in particular has been regarded as com- 
paratively invariable and immutable. Criticism of this notion 
has been voiced at various times during the development of the 
testing movement. Bagley’s vigorous protest of the mis-uses 
of the intelligence test in educational planning as expressed in a 
series of papers and addresses culminating in the publication in 
1925 of his now classic book, Determinism in Education, dealt a 
staggering blow to many of the proponents of homogeneous 
grouping. But in general the rank and file of school people con- 
tinued to subscribe in practice to Terman’s determinism. They 
regarded the child’s IQ as the principal determinant of his suc- 
cess or failure. Acting on this belief it was a simple matter to 
chart educational and vocational plans. 

Another challenge was presented in 1938 when the ‘‘Iowa 
studies’”’ by Stoddard and others, began to appear. Briefly, 
these studies showed that marked shifts in environmental influ- 
ence have the effect of raising or lowering the IQ, depending upon 
the type of influence. Publication of these studies was immedi- 
ately met with a storm of protest by some members of the 
American Psychological Association, particularly the followers 
of Terman. Accusations of unscientific procedures in the collec- 
tion and interpretation of data and conclusions unwarranted by 
the facts were given wide publicity. These criticisms, which in 
some instances were highly colored with emotion, indicated per- 
haps better than anything else the extent of belief in the idea of 
constancy even as recent as 1938. The Iowa studies will have 
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served a useful purpose if they make educators conscious of 
the variability of psychological test scores and renew their faith 
in the possibility of improving intelligence. 

It is also noteworthy that our ideas regarding the growth and 
limits of ability have undergone modification. As facts con- 
cerning growth of intelligence and learning ability have increased, 
maximum growth is no longer fixed at sixteen or eighteen years, 
but has been extended to the years of twenty or twenty-two. 
There is the clear implication that, as investigators have extended 
testing to persons in the upper ages, and included materials 
appropriate for the measurement of such persons, more accurate 
appraisal of adult abilities is possible. The results from studies 
of adult abilities suggest that learning and improvement are 
possible even at an advanced age. Perhaps a principal reason 
for prolonged plateaus in personal and professional growth is due 
to lack of impelling desire for improvement and the absence of a 
stimulating environment. The educator, in the effort to be 
realistic in counselling, has, perhaps, been too hasty in placing 
limitations on the student’s level of aspiration and accomplish- 
ment. He has failed to consider the evident fact that students 
of all grades of ability tend to set their levels of aspiration too low 
rather than too high. 











A NEW FORM OF EXAMINATION 
IN THE SUBJECT-MATTER OF PSYCHOLOGY 


EDWARD E. ANDERSON 
Wilson College 


_The importance of motivational and interest factors in the 
measurement of intelligence and aptitudes has long been recog- 
nized. Problems and questions that arouse and maintain a high 
degree of attention and interest are selected for such tests. That 
interest factors should also be taken into account in the develop- 
ment of adequate examinations used in college course work does 
not appear to be so clearly recognized. While many instructors 
have undoubtedly developed novel examination forms of high 
interest value, access to such material is not easy and a worth- 
while technique of measurement may be known to only a small 
number of those who might find it of value. The present note 
describes a comprehensive examination in psychology which 
seems to have had unusual interest value for the students who 
have taken it. 

The examination to be described here was developed for use as 
one part of the comprehensive examination given to majors in 
Psychology at the end of their senior year. The comprehensive 
examination is supposed to measure some ability not usually 
measured by ordinary course examinations. It is intended to 
measure breadth and depth of knowledge as well as the student’s 
ability to grasp principles and combine, relate, and integrate 
information from diverse sources. The examination period was 
six hours in length, divided into three-hour sessions, morning 
and afternoon. At first, essay questions were used for this 
examination, but it soon became apparent that it was difficult 
to test the desired breadth of knowledge with an exclusively 
essay test. Difficulty was encountered in providing the student 
with sufficient opportunity to demonstrate his ability to form 
relations and to show insight or creative originality. It, there- 
fore, seemed advisable to develop for a part of the comprehensive 
examination a test which could cover a wider portion of the field 
of psychology and which would offer more opportunities to see 
relationships. Such a test should avoid enmeshing the student 
in a maze of related but largely irrelevant facts by focusing his 
attention on the points at which insight might be shown. The 
46 
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usual forms of objective examinations were considered, but dis- 
carded because they seemed to demand too little of the student 
and to be best adapted for testing rote material. 

It occurred to the writer that a comprehensive examination of 
the type desired was one which would measure with reference to 
the subject-matter of psychology about the same abilities as are 
measured by an individual intelligence test with reference to 
general material. Therefore, an attempt was made to develop 
questions for testing the subject-matter of psychology according 
to the various problems and types of questions used in the Stan- 
ford-Binet intelligence test. Some types of questions used in 
intelligence tests were very easy to adapt to testing the subject- 
matter of psychology, while others were very difficult to so 
develop. _ A test was finally constructed which consisted of one 
hundred twenty questions patterned after eleven different prob- 
lem forms or question forms used in the Stanford-Binet test. 
The examination was intended to take one of the three-hour 
periods. Blank space for the written answer to each question 
was left on the question sheet. This space was sufficient to per- 
mit a brief and relevant answer, but was deliberately kept limited 
so as to discourage the student from writing too much and from 
wandering away from the point. Although the examination has 
been given as a written examination, the type of questions 
involved appear to be highly suitable for oral examinations. 

Examples of the questions used are given below. Material 
quoted directly from the examination is indented; comments and 
explanatory statements are enclosed in parentheses. 


Wilson College Psycho-Binet Comprehensive Examination 
in Psychology. General Instructions: Answer all questions. 
Do not spend too much time on a single question. ~First 
work rapidly through the entire examination, then return to 
the more difficult questions. Reduce your answers to the 
essentials but do not sacrifice clarity to brevity. No extra 
paper is necessary as all writing is to be done in the space 
allowed. If you find the space insufficient for your answers, 
you are probably writing too much. Total time for this 
examination is three hours. Suggested approximate time 
limits are indicated at each section for your own conven- 
ience in distributing your time. 
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De Section 1. Verbal Absurdities. Each of the following 

| statements contains some thing which is foolish or incon- 
sistent. You are to state just what it is that makes the 
statement absurd. 


| (This section contains six verbal absurdities of varying diffi- 
culty. An example follows.) 


A behavorist reported that when he introspected he did 
not find the three elements (sensations, images, and feelings) 
reported by Titchener but only sensations and images. 
What is foolish about that? 

Section 2. Proverbs. Interpret the following proverbs 
in psychological language. Psychological concepts and 
terminology must be used. Examples with suggested 
answers: ‘‘A thing too much seen is little prized.”’ (Sati- 
ation results in a decrease of incentive-value of objects. Or, 
satiation reduces motivation.) ‘‘The laws of conscience, 
which we pretend are born of nature, are born of custom.” 
(The super-ego is not innately determined but is formed 
through experience. ) 


(Six proverbs to be restated in psychological concepts and 
terminology follow these examples. The above examples with 
the suggested answers are included in the instructions to clarify 
the proverbs test for the student.) 


Section 3. Ball and Field Test. 


(This test includes five problems involving principles of Lewin’s 
field theory. An example follows.) 


A child is outside a circular barrier which is six feet in 
diameter; in the center is a ball which the child desires. 
Draw, according to the principles of Lewin, the field of 
forces which would represent this situation. 

Section 4. Induction. 


(This test consists of four problems, each testing different 
subject-matter in psychology. It was intended that the prob- 
lems should provide the student with the data necessary for 
formulating a general rule or law. Whether a given problem 
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will require induction of a particular student depends in part 
upon how the material has been presented in course work, since 
the problems may be answered primarily on the basis of memory 
if the student has been previously required to memorize the 
required psychological law. If the problems are to be of value 
in testing induction, therefore, they must be carefully formulated 
with respect to the past training of the individuals being tested. 
Some of the problems may also be used to test factual knowledge 
by omitting certain of the key words as in the usual completion 
question. An example of the type of question used follows.) 


A tuning fork with a frequency of 435 double vibrations 
per second is sounded simultaneously with a fork having a 
frequency of 436. Beats will be heard and they will have a 
frequency of one per second. If forks with frequencies of 
435 and 437 are sounded together, the beats will have a 
frequency of two per second. If the fork frequencies are 
435 and 442, the number of beats per second will be... . 
The rule is: 

Section 5. Finding Reasons. 


(The problems in this section involve a statement of fact for 
which the student is to provide one or more reasons or explana- 
tions, or the problem consists of a statement of a belief or theory 
for which the student is to provide supporting facts. It was not 
easy to formulate questions of this type suitable for undergradu- 
ates, but the problems appear to be of value in determining 
whether or not the student can distinguish between theory and 
fact. The section includes three problems of which an example 


follows. ) 


Give two reasons why forgetting is thought to be an active 
process rather than a matter of passive decay. 

Section 6. Essential similarities. In the following simi- 
larities problems, do not give the most general way in which 
the two items are alike, but find the most essential specific 
way in which they are similar. In problem 1, e.g., do not 
say “both are colors,” for the word “‘color” is used in the 
question; do not say ‘‘both are sensations,” for that is too 


general. 
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(This section consists of five problems of which the following is 
problem 1.) 


In what essential specific way is a blue color like a red 
color? 

Section 7. Essential Differences. Give the most specific 
essential way in which the two items differ. 


(Only two essential difference questions were used. More 
items for both Sections 6 and 7 could easily be constructed, but it 
was thought desirable to place more emphasis upon Section 9, 
which requires the student to give both the similarity and differ- 
ence between two items. An example of the essential difference 
type of question follows.) 


What is the principle difference between James’ treatment 
of consciousness and Titchener’s treatment of consciousness? 

Section 8. Sentence Building. Make up a sentence 
which has in it the three words indicated. The sentence 
must be psychological in content and must be psychologically 
correct. The words need not be used in the order in which 
they are given here. 


(Twelve groups of three words are included in this section: two 
examples follow.) 


occipital, discrimination, extirpation. 

fraternal, identical, intelligence. 

Section 9. Essential Similarities and Differences. Give 
the most essential specific way in which the following paired 
items are alike and the essential way in which they are differ- 
ent. Answer the starred items and as many of the others 
as you have time for. 


(Twenty-four paired words are given of which fifteen are 
starred for a particular student. The starred items vary from 
one student to another depending upon the particular courses a 
student has taken in psychology. Three examples of these 
paired words are listed below.) 


Massed and spaced. 
Education and propaganda. 
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MA and IQ. 
Section 10. Reasoning. 


(This section consists of three problems, two of them dealing 
with Spearman’s g theory. In these two problems the formula 
for the tetrad difference and a small correlation table (four tests) 
are given and the student is required to answer the following two 
questions: 


How m&ny common factors is it necessary to assume to 
account for the correlations in the above table? Why? 

Section 11. Vocabulary Test. Define each of the starred 
words and as many of the others as you have time for. 


(The student is required to define twenty-five specified words 
out of the fifty words of this vocabulary test. The required 
words vary from one student to another depending upon the 
particular courses in psychology which the student has taken. A 
few examples of words used in this test follow.) 


Attitude. Ergograph. Id. Standard deviation. Ete. 


The examination described above has been very satisfactory 
from both the student’s and the instructor’s standpoint. From 
the students’ reactions it appears that the novel form of the 
examination, its semi-objective character, and the unusual 
approach to psychological subject-matter all contribute toward 
making the examination ‘interesting,’ ‘different,’ and even ‘fun.’ 
This is particularly noticeable with students who have had suffi- 
cient training in mental testing to recognize the source of the 
question forms. From the standpoint of the instructor, the 
examination has several points of value. It is relatively objective 
to score, and it seems probable that with an accumulation of 
answers and a selection of items the scoring can become almost 
as objective as the scoring of an individual intelligence test. 
Since it is relatively easy to construct questions according to 
some of the forms for almost any field in psychology, the examina- 
tion (in either oral or written form) is adapatable to testing a 
wide range of material in psychology. Many of the questions 
appear to require the student to use his knowledge in a new way 
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and the examination may therefore test some ability or abilities 
not always measured by either the typical essay examination or 


objective examinations. 
Because of interest value and other advantages, questions of 
the types described above should prove to be useful supplements 


to the more customary examination forms. 








CONSTRUCTION PUZZLE B AS AN ABILITY TEST 


GUY M. WILSON AND FAYE BURGESS 
Raytheon Manufacturing Company, Newton, Massachusetts 


A critical attitude toward one’s work is one of the safeguards 
to profitable research. The tools of research can not be taken 
for granted. Re-examination and re-sharpening of the tools is 
as necessary in the research laboratory as it is in the shop. 

Construction Puzzle B* is aform-boardtest. Itis easily admin- 
istered. It is essentially a non-language test. It does not 
consume much time. The scores appear to distribute normally. 
A first look at the results indicates a test that is entirely satis- 
factory. Accordingly, it was included in the regular battery of 
tests used to determine achievement levels in a war industry 
plant. 

The material for this test consists of a board about one-half 
inch in thickness, with dimensions 1014” X 84%”. There are 
cut-outs skillfully done. See Exhibit I. The task is a simple, 
but typical problem situation; the problem psychology devel- 
oped by Rugert is, no doubt, applicable. 

The instructions for the test are simple. The form in which 
the instructions are given is important. If the subject is given 
time to visualize the test, then it becomes more or less a test of 
visual memory. That is not the purpose of the Form Board 
Test in the present testing program. 

The subject is given only five or six seconds in which to 
observe the Form Board, little more than a glance during the 
process of giving the instructions. The instructions are: “‘This 
isa Form Board. Notice the cut-out parts. They fit in neatly. 
I will dump them out on the table and then your task is to put 
them back in as quickly as you can.’”’ Thus, the subject gets 
the problem of the Form Board, without the opportunity to 
study it and fix it in mind. 





* Healey and Fernald: Construction Puzzle B. For brief history of this 
test, see Manual of Individual Testing by Augusta F. Bronner, William 
Healy, Gladys M. Lowe, Myra E. Shimberg; Little, Brown & Co., Boston, 


1928. 
t H. A. Ruger: “‘ Psychology of Efficiency,” Archives of Psychology, 1st Q, 


Vol. 1, 88 p., June, 1910. 
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The present testing program was carried out in a War Indus- 
try for the purpose of determining relative ability in order that 
individuals might be placed on suitable tasks. Probably fifty 
per cent of the work in War Industry covered in this study is 
done by workers in teams. This makes it desirable that equal 
ability should be noted. In team production, the slowest 
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ExuisitI. Showing in reduced size the details of the form board test, construc- 
tion puzzle B. 


worker on the team determines what the team will do. It is 
uneconomical to have a slow worker and a fast worker teamed up 
together. Furthermore, such a situation is usually annoying 
to the fast worker and discouraging to the slow worker. Accord- 
ingly, a determined effort was made to select a battery of tests 
which would give a fair index on the ability of the workers. The 
battery consisted of general intelligence tests and manipulative 
tests. Specifically, the battery consisted of the following tests: 





1) Stenquist Mechanical Aptitude Test 
2) Otis Quick Scoring, Test of Mental Ability 
3) Six-Minute Test of Mental Alertness (Wilson) 
4) Form Board Test, Construction Puzzle B 
5) Witmer’s Cylinder Test 
6) Minnesota Rate of Manipulation Test 
(a) Placing 
(b) Turning 
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These six tests were followed by another special ability test 
according to the job on which the subject was likely to be placed 
These special tests were: 


1) For the Mounts Departments, the Sleeve Threading 
Test 

2) For the Stem Room, the G. E. Stemhead Test 

3) For the Grid Room, the Pennsylvania Bi-Manual Test 


A basic idea ‘underlying the testing program is that an indi- 
vidual should be given several chances. In our battery of tests, 
there are at least four general intelligence tests—the Stenquist, 
Otis, Six-Minute, and Form Board. In general, if one stands 
up on any one of these tests, one is entitled to that score as ones 
rating on general ability. This is the plan used in a ‘field day’ 
event. A high jumper is given three trials to qualify, and three 
trials to finish. His record is the best of the six. 

The other tests mentioned are manipulative tests. Probably, 
the most basic one of the manipulative tests is the Minnesota 
Rate of Manipulation. That test has the advantage of being 
very simple. In placing, the preferred hand is used. In turn- 
ing, both hands are used and the work is alternated between the 
hands. It is based upon the principle announced by Dr. Ziegler* 
that rate of manipulation is a unit skill and, when once properly 
measured, can not be increased; any gain will be due to improved 
techniques. 

The discussion of the last few paragraphs develops the general 
testing situation in the particular War Plant covered in this 
research and provides a background for further study of the 
Form Board Test, Construction Puzzle B. 

The first norms for the Form Board Test were based_upon 
sixty cases and recognized five points, asshownin TableI. It is 
always possible to find the Median and Quartiles, the high and 
the low. 

As cases accumulated, norms were refigured. Table II is 
similar to Table I, except that it shows the spread at each point. 
This spread at a point makes it easier to give the subject a rating 
index based upon his score. 





* W. A. Ziegler, Minnesota Rate of Manipulation Test, Educational Test 
Bureau, Minneapolis, Minn. 
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In the meantime, some disturbing elements had crept into the 
situation. A glance at Tables I and II indicates the small 
spreads at the upper end and wide spreads at the lower end of 
the curve. For example, the top group in Table II covers the 
spread from sixteen seconds to twenty-six seconds inclusive, a 
total of eleven seconds. The first quartile group has a spread of 
thirty-two seconds; the median a spread of eighty-eight seconds; 
the third quartile a spread of ninety-four seconds; while the low 
group has a still wider spread. This was one disturbing factor. 


TABLE I.—SHOWING PRELIMINARY Form Boarp Norms, JUNE, 
1943, 60 CasrEs 


Ge Miao 3a «ee RR IN «nie eh < 90 0 a4 16 
are Sas Be ae iil RN a han Urn xin 2 39 
a ek ine Bipis :63 
SE tata bhd Suc bo. 6 eh eR ke vA 018 Maw iy eel 2:26 
RCS: I PS. 2 ee ee 13:25 


A study of the data, which were translated into graphic form, 
shows that fifty per cent of the cases at the top come within the 
first fifty units of the scale, counting seconds as units. The entire 
scale runs up to over 400 units. The lower part of the scale was 
not fully developed graphically. It runs into a group of people 
who did not solve the problem of the Form Board, that is, they 


gave up. 
TasLeE IJ.—SnHowrna Form Boarp Norms, Ocroser, 1943, 


259 CaAsES 
eee: oe 
Natur ath oe hs Shaina tick :58- :27 
NS 5.66 cuuethapkinenka 2:26-— :59 
EEE I 
Te cdi sins he dein Walenta 15:00-4:01 


It was observed that some of the people who took an excep- 
tionally long time on the Form Board, and even some who gave 
up entirely, had done well on other tests. A second question 
naturally arose: ‘‘If one does well on the Otis Test of Mental 
Ability, and on the Six-Minute Test of Alertness, and then falls 
down on the Form Board Test, what is wrong? Is it the fault of 
the Otis and Six-Minute Tests, or is it the fault of the Form 
Board Test?” Referring to Ruger’s Problem Psychology, we 
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note that very capable people sometimes develop a complex or 
an emotional block, following which they are quite unable to 
solve a problem. This does not happen often, but, apparently, 
it is possible at all levels of ability. 

As a help in solving this second difficulty, and for possible 
values not otherwise disclosed, it was decided to give a second 
trial on the Form Board Test. The second trial followed immedi- 
ately after the first trial. In case the subject failed to complete 
the test, the problem was solved by the person administering the 
test; that is, the administrator put the parts in place, permitting 
the subject to observe the solution of the problem. 

The results on the second trial became very interesting. Out 
of the first one hundred ninety-one cases of subjects taking the 
second trial, five cases exceeded the time of the first trial. Two 
of these five rated above average on the Otis and Six-Minute 
Tests. Of the other three, one rated below average, and two 
low. (See Table II.) 

In seven of the cases, for whom the scores were low on both first 
and second trials, one was above average in intelligence, three 
below average, and three low in intelligence. However, the 
number of subjects requiring a long time to complete the test 
was greatly reduced in the second trial. In other words, the 
subject observing the point to the problem involved in the test 
usually proceeded to solve the problem promptly. 

Nevertheless, there continued to ‘be a tail-end of unsatis- 
factory results. A few months later, a special study was made 
of forty-four cases, taking two minutes or more up to nine 
minutes and five seconds on the Form Board. Of the forty-four 
cases, one moved from low on the first trial to lst Quartile on the 
second trial; thirteen changed from low on the first trial to median 
on the second trial; ten changed from low on the first trial to 
3rd Quartile on the second trial. From the above statements, it 
is evident that Norms for the second trial had been constructed. 

Table III shows the Norms based upon four hundred forty- 
three cases for the first trial and one hundred ninety-one cases 
for the second trial. Table III is constructed according to the 
Trabue Technique, used by Ziegler in the construction of Norms 
for the Minnesota Rate of Manipulation Test. The middle 
point of Table III, ‘M,’ is the Median. Q1 and Q3 have the 
usual meanings. Fifty per cent of the cases are included between 
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Q! and Q3. The other steps, above and below, the Median, are 
‘Q’ distances. (See Table III.) 


TABLE II].—SHow1ne Norms AND OTHER SIGNIFICANT DATA ON 
THE Form Boarp Test, Construction Puzzuie B, First 
TRIAL AND SEcOND TRIAL, POINTS AND RANGE ARE 
IN SECONDS 





























First Trial Second Trial 
Point} Range Freq. | Point} Range Freq. 

T+ | :9 :1l- 0 :8 :10- 0 

T 714 |:16-:12 2 710 + j:12-:11 1 
T- 719 |:21-:17 9 °13.5):14—-:13 9 
Ql1+ | :24 |:28-:22 30 :15.5):16—:15 23 
Ql 7:36 |:44-:29 61 -18 |:19-:17 37 

M 765 |:85-:45 | 124 7:23 +|:26-:20 49 

Q3 7110 |:135-:86 | 66 731 |:35-:27 29 
Q3— | :149 |:165-:136) 27 :42 |:48-:36 22 
L+ :170 |:185-:166; 7 :59 |:69-:49 10 

L :212 |:210-:186) 8 :85 |:100—:70 2 
L—- :270 |:299-:211) 99 :182 |:262-:101; 9 
Total 433 Cases} Total 191 Cases 








It became more and more evident that Form Board Test 
results needed special attention, probably special interpretation. 
Possibly in some cases, the Form Board results needed to be 
omitted from the interpretation. Accordingly, the policy was 
adopted of omitting the Form Board Test results in the final 
evaluation of a person if such results were quite out of line with 
the results of the other tests. This, later, was changed as noted 
below. 

The correlations on the Form Board Test were low. This 
further indicated that there was some factor not properly con- 
trolled or evaluated. In the first table of inter-correlations, 
based upon four hundred twelve cases, the Form Board r’s were 
as follows: 
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With the Stenquist + .163 
With the Otis +.118 
With the Six-Minute +.107 
With the Cylinder + .270 
With the Threading Sleeves + .021 
With Schooling + .237 
With the Rating of Instructors + .048 

Average r + .138 


These correlations are obviously low. It is interesting to note, 
however, that they are all positive. 

Correlations were pursued for a considerable length of time. 
Table IV shows the inter-correlations on the Form Board first 
and second trials with the Otis, and Six-Minute Tests. Here, 
again, the correlations are low. It is possible that the foreign 
language element enters into the low correlations between the 
intelligence tests and the Form Board Test. In time, this will 
be checked out through the use of the Army Beta, a non-language 
test. 


TABLE IV.—INTER-CORRELATIONS BETWEEN Form Boarp TEstT, 
First AND SECOND TRIALS, AND Oris AND Srx-MINUTE 


TESTS 
No. Cases r PE, 
Otis and Six-Minute............... 160 +.640 .032 
Form Board, First Trial and Second 
is cis Abo ccddedkare bo edtee 159 +.375 .045 
Form Board, First Trial and Six- 
I 4 cinthna cea ba wks) 6 hee 159 +.265 .050 
Form Board, Second Trial and Six- 
RL i's oo cus COMu ge Aare Mae OS4 159 + .322> .048 
Form Board, First Trial and Otis... . 159 +.111 .053 
Form Board, Second Trial and Otis. . 159 —.103 .053 


Thus, the performance of the Form Board as a measure of 
comparative ability in a War Industry Plant is still a matter for 
study. The present plan for interpreting the Form Board Test 
results is as follows: 

1) In general, use the norms for the first trial as a basis for 
rating. 
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2) However, when the time on the second trial exceeds the 
time on the first trial, use the norms for the second trial as a basis 
for interpretation. 

3) When the first trial scores show a rating of Q3 or below, use 
the higher of the first or second trial ratings, provided final rating 
is not above M. 

4) When the results on the first trial are Ql or better, and the 
results on the second trial are only a little lower (due to confusion 
resulting from turning the Form Board around for the second 
trial) use first trial results. 

5) In some cases, disregard Form Board results. There are 
cases in which Form Board results are quite out of line with other 
test results for the individual. 

The Form Board Test, Construction Puzzle B will continue to 
hold a place in the testing program of the plant in which this 
study was made. Testees like it; we like it. Yet, in approxi- 
mately twenty per cent of the cases the interpretation is difficult. 
Seventy-four cases out of four hundred thirty-four go above five 
minutes (300 seconds) on the first trial. Nearly ten per cent of 
the cases go above five minutes on the second trial,—sixteen out 
of one hundred seventy-five cases. As a final recourse, the Form 
Board Test results are dropped out of consideration when they 
are clearly out of line with the results from other tests in our 
test battery, as indicated in 5) above. 














BOOK REVIEWS 


GeorGeE H. Estasrooxs. Hypnotism. New York: E. P. Dutton 
& Co., Inc., 1943, pp. 249. 


There is today an increasingly widespread interest in hypno- 
tism. The Army is using hypnosis successfully and it is being 
publicized from many sources. Life magazine has popularized 
hypnosis in picture form. Reader’s Digest has had a number of 
articles about, hypnosis in it concerning its present uses for all 
purposes. Even a psychoanalytic institution like Menninger’s 
has employed a hypnotist to work along with psychoanalysts. 
Hypnotism is coming into its own, or, more accurately, it is being 
tried out by psychiatrists and psychologists and there is, as a 
result, a live interest in the subject. Hence, because we have 
with us many sensationalists, hypnotism has been staged and 
popularized on the stage. Because of this increase in popularity 
there is need for an intelligently and reliably well-written book 
about the subject. 

Estabrooks’ book on Hypnotism is well written; the style is 
clear and the content fairly reliable. There is no attempt to 
explain the whole process of hypnotism, its causes and effects, in 
any single hypothesis. There is, instead, a fairly lucid descrip- 
tion of hypnotism, how it is induced, some common phenomena 
in hypnotism, posthypnotic suggestion and auto-suggestion, 
some curious states related to hypnotism, medical uses of it, hyp- 
notism in crime, and, in addition to these, a couple of popular, 
modernized chapters; one is called ‘‘ Hypnotism in Warfare” and 
the other, ‘‘This Man Hitler.’”” The chapter on the use of hyp- 
notism in crime and warfare and the one on Hitler are probably 
intended by the author as a way of arousing interest in the book. 

The author sees limitations as well as possibilities forthe use 
of hypnotism. He can see the use of it as an anesthetic for some 
purposes at some times. He can see its possible use as a sub- 
stitute for metrazol; at least it can reduce the pain of shock 
therapy. However, metrazol is on the way out for use in shock 
therapy. He also sees the limitations or the inadvisability of 
using hypnotism. For example, perhaps it is a fact that hypno- 
tism could be used for reducing the pain of cancer, but Esta- 
brooks is realistic enough to realize that removing the pain does 
not remove the cause, and it might even hide the cause and thus 
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help the illness prevail. Also, he is aware of the fact that hypno- 
tism can help explain many forms of insanity, but it cannot neces- 
sarily help cure them. He claims for hypnosis that it is as good 
a method as any for treating psychoneuroses and that some of it 
explains psychoanalysis. 

In the book, as he writes it, a person who wants to familiarize 
himself with hypnotism can learn a few interesting facts about 
it and its uses. He can learn, for example, that hallucinations 
are a final test of it, that hypnotism is not a variety of sleep, that 
a person who tries to use hypnosis without realizing that subjects 
have ethical limits which will be manifested even under hypnosis 
will fool himself, that the cracked bones that sometimes have 
been reported as resulting from metrazol treatments can be 
reduced if not eliminated by the use of hypnosis. Hypnosis 
sometimes lasts for many, many hours—sometimes for years. In 
explaining these and similar, related concepts, Estabrooks does 
an excellent job. The chapter on ‘‘This Man Hitler,’”’ however, 
where Hitler is described as the hypnotist par ezxcellente is 
extremely questionable and, in the long run, is not likely to add 
to the interest in the book. His suggestions for the use of hyp- 
nosis in warfare as a sort of spy system is not likely to take any 
too well either. It is more than merely a problem of ethics 
involved. All in all, however, he has written a book that should 
be valuable for general readers and also for psychologists. The 
chapters on Hitler and warfare they can take with many grains 
of salt as in a critical sense they warrant. H. MELTZER 

Psychological Service Center, St.-Louis, Missouri 


HELEN HALL JENNINGS. Leadership and Isolation. New York: 
Longmans, Green, and Co., 1943, pp. 240. 


Relatively few problems in human relations are more important 
than the problem of who chooses to live with whom, who rejects 
whom, and how can human beings learn to enter into relation- 
ships as human beings and organize humanized relations. Any 
knowledge or insights that can yield information and techniques 
and that will be helpful to leaders, at least, in teaching others how 
to enter into relationships, how to communicate experiences, are 
of significance today. It is with this type of problem that Dr. 
Jennings concerns herself in her book, Leadership and Isolation. 
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More specifically, some four hundred girls in a training school 
were asked to express their mutual acceptances and rejections of 
all the individuals in the group at one given date and eight months 
later. They were asked to express the names of people they 
would like to live with, work with, play with. Only the living 
and working expressions were analyzed to show insight on the 
problem of interpersonal choice in relationship to personality 
characteristics. Behavior characteristics are those largely deter- 
mined or assigned by house mothers. With the use of such data, 
Jennings has‘done an interesting and thorough job in studying 
the problem of choice more thoroughly than ever done by the 
sociometric approach. Her book, then, is a contribution to 
sociometric research, also a contribution to methods in social 
research in general, as well as merely a report of the choosing 
process in the four hundred girls studied. 

The problem of choice as investigated in this particular volume 
is broken up into two approaches; namely, emotional expansive- 
ness and social expansiveness. By emotional expansiveness the 
author means a number of persons for whom the individual 
expresses positive choices; and by social expansiveness she means 
the individual and social initiative measured by the extent. of a 
subject’s social-contact-range. In the institution studied, of 
course, one advantage the author had was that the population 
was constant enough so that it was possible to examine the same 
individuals and relatively similar, if not almost identical, social 
climate. The relationship of this type of approach to some of 
Kurt Lewin’s studies on social climates of democracy and autoc- 
racy is fairly apparent. The concepts used by Jennings are 
quite in accord with the field-space concepts of Lewin and his 
followers. 

Leaders are not perfect individuals. That is one of the more 
interesting results found by the author. Not every leader is 
invariably a pleasant person. In fact, in their conduct toward 
their superiors they are frequently severe in their criticism. Or, 
as Murphy in the preface to the volume aptly puts it, it becomes 
clear how a ‘dominant’ personality can yet be a source of security 
to others, “how the protection of one’s own ego-needs may 
involve not a violation of, but a concomitant need to protect the 
ego-needs, the status cravings, of others. For the effectiveness 
of these over-chosen individuals in maintaining their own posi- 
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tion is almost inseparable from their effectiveness in supporting 
the needs of others within the group.”” The problem of choice 
status and the interrelationships involved is the type of study that 
can be used to advantage in industry as well as in a training 
school, where the labor turnover would not be large if sustained 
analysis of choice processes could be undertaken and more inter- 
esting insights perhaps in a training-school group developed. 

All in all, this is an interesting study, an interesting follow-up 
of Moreno’s sociometric approaches. It represents a more 
intensive approach of the whole problem of choosing in inter- 
personal relationships. And the fact that these over-chosen and 
under-chosen are related to behavior characteristics makes 
of it a more enlightening study and somewhat broader than the 
ordinary sociometric approach where choices are merely asked 
for without any correlations to behavior characteristics. Con- 
sidering the fact that Moreno’s basic concept is that effective 
functioning depends on spontaneity, one wonders why, instead of 
merely ratings by house mothers, the sociometric research group 
does not depend more on either dramatic or projective techniques 
for evaluating spontaneity and naturalness in human beings and 
correlating that, rather than ratings with sociometric choices of 
the people who are leaders or who turn into isolationists, the 
_ over-chosen and the under-chosen. Aside from that additional 

suggestion, the book is a valuable contribution to sociometric 
research and should be of use to individuals who do not think of 
themselves as being exclusively sociometrists. H. MELTZER 

Psychological Service Center, St. Louis, Missouri 




















